On Tue, Dec 02, 2025 at 02:42:21PM +0800, Yingchao Deng wrote:
The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI. It allows a debugger to send to trigger events to a processor or to send a trigger event to one or more processors when a trigger event occurs on another processor on the same SoC, or even between SoCs. Qualcomm CTI implementation differs from the standard CTI in the following aspects:
- The number of supported triggers is extended to 128.
- Several register offsets differ from the CoreSight specification.
I apologize for my late review of this series. For easier maintenance later, I have several comments for register access.
[...]
+static const u32 cti_normal_offset[] = {
- [INDEX_CTIINTACK] = CTIINTACK,
- [INDEX_CTIAPPSET] = CTIAPPSET,
- [INDEX_CTIAPPCLEAR] = CTIAPPCLEAR,
- [INDEX_CTIAPPPULSE] = CTIAPPPULSE,
- [INDEX_CTIINEN] = CTIINEN(0),
- [INDEX_CTIOUTEN] = CTIOUTEN(0),
I prefer to update the these two macros to CTIINENn and CTIOUTENn, as later we will not use CTIINEN(n) and CTIOUTEN(n) anymore.
- [INDEX_CTITRIGINSTATUS] = CTITRIGINSTATUS,
- [INDEX_CTITRIGOUTSTATUS] = CTITRIGOUTSTATUS,
- [INDEX_CTICHINSTATUS] = CTICHINSTATUS,
- [INDEX_CTICHOUTSTATUS] = CTICHOUTSTATUS,
- [INDEX_CTIGATE] = CTIGATE,
- [INDEX_ASICCTL] = ASICCTL,
- [INDEX_ITCHINACK] = ITCHINACK,
- [INDEX_ITTRIGINACK] = ITTRIGINACK,
- [INDEX_ITCHOUT] = ITCHOUT,
- [INDEX_ITTRIGOUT] = ITTRIGOUT,
- [INDEX_ITCHOUTACK] = ITCHOUTACK,
- [INDEX_ITTRIGOUTACK] = ITTRIGOUTACK,
- [INDEX_ITCHIN] = ITCHIN,
- [INDEX_ITTRIGIN] = ITTRIGIN,
- [INDEX_ITCTRL] = CORESIGHT_ITCTRL,
+};
+static const u32 cti_extended_offset[] = {
- [INDEX_CTIINTACK] = QCOM_CTIINTACK,
- [INDEX_CTIAPPSET] = QCOM_CTIAPPSET,
- [INDEX_CTIAPPCLEAR] = QCOM_CTIAPPCLEAR,
- [INDEX_CTIAPPPULSE] = QCOM_CTIAPPPULSE,
- [INDEX_CTIINEN] = QCOM_CTIINEN,
- [INDEX_CTIOUTEN] = QCOM_CTIOUTEN,
- [INDEX_CTITRIGINSTATUS] = QCOM_CTITRIGINSTATUS,
- [INDEX_CTITRIGOUTSTATUS] = QCOM_CTITRIGOUTSTATUS,
- [INDEX_CTICHINSTATUS] = QCOM_CTICHINSTATUS,
- [INDEX_CTICHOUTSTATUS] = QCOM_CTICHOUTSTATUS,
- [INDEX_CTIGATE] = QCOM_CTIGATE,
- [INDEX_ASICCTL] = QCOM_ASICCTL,
- [INDEX_ITCHINACK] = QCOM_ITCHINACK,
- [INDEX_ITTRIGINACK] = QCOM_ITTRIGINACK,
- [INDEX_ITCHOUT] = QCOM_ITCHOUT,
- [INDEX_ITTRIGOUT] = QCOM_ITTRIGOUT,
- [INDEX_ITCHOUTACK] = QCOM_ITCHOUTACK,
- [INDEX_ITTRIGOUTACK] = QCOM_ITTRIGOUTACK,
- [INDEX_ITCHIN] = QCOM_ITCHIN,
- [INDEX_ITTRIGIN] = QCOM_ITTRIGIN,
- [INDEX_ITCTRL] = CORESIGHT_ITCTRL,
+};
I saw CTI registers are within 4KiB (0x1000), we can don't convert standard regiserts and only convert to QCOM register based on the standard ones. So you can drop the cti_normal_offset strucuture and only have a cti_reg_qcom_offset[] struct:
static const u32 cti_extended_offset[] = { [CTIINTACK] = QCOM_CTIINTACK, [CTIAPPSET] = QCOM_CTIAPPSET, [CTIAPPCLEAR] = QCOM_CTIAPPCLEAR, [CTIAPPPULSE] = QCOM_CTIAPPPULSE, [CTIINEN] = QCOM_CTIINEN, ... };
Then you could create two helpers for register address:
static void __iomem *cti_reg_addr_with_nr(struct cti_drvdata *drvdata, u32 reg, u32 nr) { /* convert to qcom specific offset */ if (unlikely(drvdata->is_qcom_cti)) reg = cti_extended_offset[reg];
return drvdata->base + reg + sizeof(u32) * nr; }
static void __iomem *cti_reg_addr(struct cti_drvdata *drvdata, u32 reg) { return cti_reg_addr_with_nr(drvdata, reg, 0); }
/*
- CTI devices can be associated with a PE, or be connected to CoreSight
@@ -70,15 +119,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata) /* write the CTI trigger registers */ for (i = 0; i < config->nr_trig_max; i++) {
writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
writel_relaxed(config->ctiinen[i],drvdata->base + cti_offset(drvdata, INDEX_CTIINEN, i));
writel_relaxed(config->ctiinen[i], cti_reg_addr_with_nr(drvdata, CTIINENn, i));
And apply for the same cases below.
/* other regs */
- writel_relaxed(config->ctigate, drvdata->base + CTIGATE);
- writel_relaxed(config->asicctl, drvdata->base + ASICCTL);
- writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
- writel_relaxed(config->ctigate, drvdata->base + cti_offset(drvdata, INDEX_CTIGATE, 0));
writel_relaxed(config->ctigate, cti_reg_addr(drvdata, CTIGATE));
And apply for the same cases below.
[...]
@@ -394,8 +447,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op, /* update the local register values */ chan_bitmask = BIT(channel_idx);
- reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :
CTIOUTEN(trigger_idx));
- reg_offset = (direction == CTI_TRIG_IN ? cti_offset(drvdata, INDEX_CTIINEN, trigger_idx) :
cti_offset(drvdata, INDEX_CTIOUTEN, trigger_idx));
For readable, we can improve a bit with code alignment:
reg_offset = (direction == CTI_TRIG_IN) ? cti_reg_addr_with_nr(drvdata, CTIINENn, trigger_idx) : cti_reg_addr_with_nr(drvdata, CTIOUTENn, trigger_idx);
[...]
@@ -981,9 +1035,28 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id) drvdata->csdev_release = drvdata->csdev->dev.release; drvdata->csdev->dev.release = cti_device_release;
- /* check architect value*/
- devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);
- if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {
drvdata->subtype = QCOM_CTI;drvdata->offsets = cti_extended_offset;
As a result, we can only set the is_qcom_cti flag:
drvdata->is_qcom_cti = true;
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
- } else {
drvdata->subtype = ARM_STD_CTI;drvdata->offsets = cti_normal_offset;- }
- /* all done - dec pm refcount */ pm_runtime_put(&adev->dev);
- dev_info(&drvdata->csdev->dev, "CTI initialized\n");
- dev_info(&drvdata->csdev->dev, "CTI initialized; subtype=%d\n", drvdata->subtype);
dev_info(&drvdata->csdev->dev, "%s CTI initialized\n", drvdata->is_qcom_cti ? "QCOM" : "");
return 0; pm_release: diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c index a9df77215141..12a495382999 100644 --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c @@ -172,9 +172,8 @@ static struct attribute *coresight_cti_attrs[] = { /* register based attributes */ -/* Read registers with power check only (no enable check). */ -static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+static ssize_t coresight_cti_mgmt_reg_show(struct device *dev,
struct device_attribute *attr, char *buf){ struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); @@ -189,6 +188,40 @@ static ssize_t coresight_cti_reg_show(struct device *dev, return sysfs_emit(buf, "0x%x\n", val); } +/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
- u32 idx, val = 0;
- pm_runtime_get_sync(dev->parent);
- raw_spin_lock(&drvdata->spinlock);
- idx = drvdata->config.ext_reg_sel;
- if (drvdata->config.hw_powered) {
switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:val = readl_relaxed(drvdata->base +cti_offset(drvdata, cti_attr->off, idx));break;default:val = readl_relaxed(drvdata->base + cti_offset(drvdata, cti_attr->off, 0));break;}- }
- raw_spin_unlock(&drvdata->spinlock);
- pm_runtime_put_sync(dev->parent);
- return sysfs_emit(buf, "0x%x\n", val);
+}
/* Write registers with power check only (no enable check). */ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct device_attribute *attr, @@ -197,19 +230,39 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); unsigned long val = 0;
- u32 idx;
if (kstrtoul(buf, 0, &val)) return -EINVAL; pm_runtime_get_sync(dev->parent); raw_spin_lock(&drvdata->spinlock);
- if (drvdata->config.hw_powered)
cti_write_single_reg(drvdata, cti_attr->off, val);
- idx = drvdata->config.ext_reg_sel;
- if (drvdata->config.hw_powered) {
switch (cti_attr->off) {case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, idx), val);break;default:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, 0), val);break;}- }
For both coresight_cti_reg_show() and coresight_cti_reg_store(), can we always use "cti_attr->off" as the offset for regitser access? I mean we don't need the extra config.ext_reg_sel, eventually any register we can calculate a offset for it.
raw_spin_unlock(&drvdata->spinlock); pm_runtime_put_sync(dev->parent); return size; } +#define coresight_cti_mgmt_reg(name, offset) \
- (&((struct cs_off_attribute[]) { \
{ \__ATTR(name, 0444, coresight_cti_mgmt_reg_show, NULL), \offset \} \- })[0].attr.attr)
#define coresight_cti_reg(name, offset) \ (&((struct cs_off_attribute[]) { \ { \ @@ -237,17 +290,17 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, /* coresight management registers */ static struct attribute *coresight_cti_mgmt_attrs[] = {
- coresight_cti_reg(devaff0, CTIDEVAFF0),
- coresight_cti_reg(devaff1, CTIDEVAFF1),
- coresight_cti_reg(authstatus, CORESIGHT_AUTHSTATUS),
- coresight_cti_reg(devarch, CORESIGHT_DEVARCH),
- coresight_cti_reg(devid, CORESIGHT_DEVID),
- coresight_cti_reg(devtype, CORESIGHT_DEVTYPE),
- coresight_cti_reg(pidr0, CORESIGHT_PERIPHIDR0),
- coresight_cti_reg(pidr1, CORESIGHT_PERIPHIDR1),
- coresight_cti_reg(pidr2, CORESIGHT_PERIPHIDR2),
- coresight_cti_reg(pidr3, CORESIGHT_PERIPHIDR3),
- coresight_cti_reg(pidr4, CORESIGHT_PERIPHIDR4),
- coresight_cti_mgmt_reg(devaff0, CTIDEVAFF0),
- coresight_cti_mgmt_reg(devaff1, CTIDEVAFF1),
- coresight_cti_mgmt_reg(authstatus, CORESIGHT_AUTHSTATUS),
- coresight_cti_mgmt_reg(devarch, CORESIGHT_DEVARCH),
- coresight_cti_mgmt_reg(devid, CORESIGHT_DEVID),
- coresight_cti_mgmt_reg(devtype, CORESIGHT_DEVTYPE),
- coresight_cti_mgmt_reg(pidr0, CORESIGHT_PERIPHIDR0),
- coresight_cti_mgmt_reg(pidr1, CORESIGHT_PERIPHIDR1),
- coresight_cti_mgmt_reg(pidr2, CORESIGHT_PERIPHIDR2),
- coresight_cti_mgmt_reg(pidr3, CORESIGHT_PERIPHIDR3),
- coresight_cti_mgmt_reg(pidr4, CORESIGHT_PERIPHIDR4),
I don't see any benefit for updating from coresight_cti_reg() to coresight_cti_mgmt_reg(). If really want to do this, should remove the macro coresight_cti_reg()?
NULL, }; @@ -258,13 +311,15 @@ static struct attribute *coresight_cti_mgmt_attrs[] = {
- If inaccessible & pcached_val not NULL then show cached value.
*/ static ssize_t cti_reg32_show(struct device *dev, char *buf,
u32 *pcached_val, int reg_offset)
u32 *pcached_val, int index)
We don't need to change anything for this. The passed "reg_offset" should be always a final offset, no matter for standard CTI or QCOM case, the driver directly uses the offset for register access.
[...]
+/*
- QCOM CTI supports up to 128 triggers, there are 6 registers need to be
- expanded to up to 4 instances, and ext_reg_sel can be used to indicate
- which one is in use.
- CTITRIGINSTATUS, CTITRIGOUTSTATUS,
- ITTRIGIN, ITTRIGOUT,
- ITTRIGINACK, ITTRIGOUTACK.
- */
+static ssize_t ext_reg_sel_show(struct device *dev,
struct device_attribute *attr,char *buf)+{
- u32 val;
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- raw_spin_lock(&drvdata->spinlock);
- val = drvdata->config.ext_reg_sel;
- raw_spin_unlock(&drvdata->spinlock);
- return sprintf(buf, "%d\n", val);
+}
+static ssize_t ext_reg_sel_store(struct device *dev,
struct device_attribute *attr,const char *buf, size_t size)+{
- unsigned long val;
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- if (kstrtoul(buf, 0, &val))
return -EINVAL;- if (val > ((drvdata->config.nr_trig_max + 31) / 32 - 1))
return -EINVAL;- raw_spin_lock(&drvdata->spinlock);
- drvdata->config.ext_reg_sel = val;
- raw_spin_unlock(&drvdata->spinlock);
- return size;
+}
As said, I don't think the trigger register is any different from other register access. So the existed APIs would be sufficient.
As a result, we don't need to add two above functions.
Thanks, Leo
On Wed, Dec 03, 2025 at 06:29:44PM +0000, Coresight ML wrote:
[...]
+/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
- u32 idx, val = 0;
- pm_runtime_get_sync(dev->parent);
- raw_spin_lock(&drvdata->spinlock);
- idx = drvdata->config.ext_reg_sel;
- if (drvdata->config.hw_powered) {
switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:
I read again and now I understand why you need "config.ext_reg_sel" as an index for these expending registers.
I think you should extend attrs for the new adding registers:
static struct attribute *coresight_cti_regs_attrs[] = { ... coresight_cti_reg(triginstatus, CTITRIGINSTATUS), /* Qcom CTI only for triginstatus1/2/3 */ coresight_cti_reg(triginstatus1, CTITRIGINSTATUS + 0x4), coresight_cti_reg(triginstatus2, CTITRIGINSTATUS + 0x8), coresight_cti_reg(triginstatus3, CTITRIGINSTATUS + 0xc), ... }
Then, you can add a is_visible() in coresight_cti_regs_group:
static umode_t coresight_cti_regs_is_visible(struct kobject *kobj, struct attribute *attr, int n) { struct device *dev = container_of(kobj, struct device, kobj); struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
/* Mute QCOM CTI registers for standard CTI module */ if (!drvdata->is_qcom_cti) { if (attr == &triginstatus1.attr || attr == &triginstatus2.attr || attr == &triginstatus3.attr) return 0; }
return attr->mode; }
static const struct attribute_group coresight_cti_regs_group = { .attrs = coresight_cti_regs_attrs, .name = "regs", .is_visible = coresight_cti_regs_is_visible, };
Thanks, Leo
Hi,
On Thu, 4 Dec 2025 at 08:38, Leo Yan leo.yan@arm.com wrote:
On Wed, Dec 03, 2025 at 06:29:44PM +0000, Coresight ML wrote:
[...]
+/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
- u32 idx, val = 0;
- pm_runtime_get_sync(dev->parent);
- raw_spin_lock(&drvdata->spinlock);
- idx = drvdata->config.ext_reg_sel;
- if (drvdata->config.hw_powered) {
switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:I read again and now I understand why you need "config.ext_reg_sel" as an index for these expending registers.
Having this index for these extended registers matches what we do for the INEN and OUTEN registers. This gives the user a consistent approach. We do not want the unnecessary attributes as it will increase the memory footprint for all cti instances, not just the qcom ones.
The first patch in this series works to reduce the memory footprint by only allocating resource based on the actual configuration. For example for an ARM designed CTI with 8 trigger registers, we no longer declare static 128 x 32 bit arrays for each of INEN and OUTEN which were required by the original design.
Given that there can be 10s or 100s of CTIs in a large multicore system, reducing the footprint to match the actual configuration, and offering a level of compression by using an index + single file to access a set of registers improves the efficiency of the driver.
Regards
Mike
I think you should extend attrs for the new adding registers:
static struct attribute *coresight_cti_regs_attrs[] = { ... coresight_cti_reg(triginstatus, CTITRIGINSTATUS), /* Qcom CTI only for triginstatus1/2/3 */ coresight_cti_reg(triginstatus1, CTITRIGINSTATUS + 0x4), coresight_cti_reg(triginstatus2, CTITRIGINSTATUS + 0x8), coresight_cti_reg(triginstatus3, CTITRIGINSTATUS + 0xc), ... }
Then, you can add a is_visible() in coresight_cti_regs_group:
static umode_t coresight_cti_regs_is_visible(struct kobject *kobj, struct attribute *attr, int n) { struct device *dev = container_of(kobj, struct device, kobj); struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
/* Mute QCOM CTI registers for standard CTI module */ if (!drvdata->is_qcom_cti) { if (attr == &triginstatus1.attr || attr == &triginstatus2.attr || attr == &triginstatus3.attr) return 0; } return attr->mode;}
static const struct attribute_group coresight_cti_regs_group = { .attrs = coresight_cti_regs_attrs, .name = "regs", .is_visible = coresight_cti_regs_is_visible, };
Thanks, Leo
Hi,
On Wed, 3 Dec 2025 at 18:29, Leo Yan leo.yan@arm.com wrote:
On Tue, Dec 02, 2025 at 02:42:21PM +0800, Yingchao Deng wrote:
The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI. It allows a debugger to send to trigger events to a processor or to send a trigger event to one or more processors when a trigger event occurs on another processor on the same SoC, or even between SoCs. Qualcomm CTI implementation differs from the standard CTI in the following aspects:
- The number of supported triggers is extended to 128.
- Several register offsets differ from the CoreSight specification.
I apologize for my late review of this series. For easier maintenance later, I have several comments for register access.
[...]
+static const u32 cti_normal_offset[] = {
[INDEX_CTIINTACK] = CTIINTACK,[INDEX_CTIAPPSET] = CTIAPPSET,[INDEX_CTIAPPCLEAR] = CTIAPPCLEAR,[INDEX_CTIAPPPULSE] = CTIAPPPULSE,[INDEX_CTIINEN] = CTIINEN(0),[INDEX_CTIOUTEN] = CTIOUTEN(0),I prefer to update the these two macros to CTIINENn and CTIOUTENn, as later we will not use CTIINEN(n) and CTIOUTEN(n) anymore.
[INDEX_CTITRIGINSTATUS] = CTITRIGINSTATUS,[INDEX_CTITRIGOUTSTATUS] = CTITRIGOUTSTATUS,[INDEX_CTICHINSTATUS] = CTICHINSTATUS,[INDEX_CTICHOUTSTATUS] = CTICHOUTSTATUS,[INDEX_CTIGATE] = CTIGATE,[INDEX_ASICCTL] = ASICCTL,[INDEX_ITCHINACK] = ITCHINACK,[INDEX_ITTRIGINACK] = ITTRIGINACK,[INDEX_ITCHOUT] = ITCHOUT,[INDEX_ITTRIGOUT] = ITTRIGOUT,[INDEX_ITCHOUTACK] = ITCHOUTACK,[INDEX_ITTRIGOUTACK] = ITTRIGOUTACK,[INDEX_ITCHIN] = ITCHIN,[INDEX_ITTRIGIN] = ITTRIGIN,[INDEX_ITCTRL] = CORESIGHT_ITCTRL,+};
+static const u32 cti_extended_offset[] = {
[INDEX_CTIINTACK] = QCOM_CTIINTACK,[INDEX_CTIAPPSET] = QCOM_CTIAPPSET,[INDEX_CTIAPPCLEAR] = QCOM_CTIAPPCLEAR,[INDEX_CTIAPPPULSE] = QCOM_CTIAPPPULSE,[INDEX_CTIINEN] = QCOM_CTIINEN,[INDEX_CTIOUTEN] = QCOM_CTIOUTEN,[INDEX_CTITRIGINSTATUS] = QCOM_CTITRIGINSTATUS,[INDEX_CTITRIGOUTSTATUS] = QCOM_CTITRIGOUTSTATUS,[INDEX_CTICHINSTATUS] = QCOM_CTICHINSTATUS,[INDEX_CTICHOUTSTATUS] = QCOM_CTICHOUTSTATUS,[INDEX_CTIGATE] = QCOM_CTIGATE,[INDEX_ASICCTL] = QCOM_ASICCTL,[INDEX_ITCHINACK] = QCOM_ITCHINACK,[INDEX_ITTRIGINACK] = QCOM_ITTRIGINACK,[INDEX_ITCHOUT] = QCOM_ITCHOUT,[INDEX_ITTRIGOUT] = QCOM_ITTRIGOUT,[INDEX_ITCHOUTACK] = QCOM_ITCHOUTACK,[INDEX_ITTRIGOUTACK] = QCOM_ITTRIGOUTACK,[INDEX_ITCHIN] = QCOM_ITCHIN,[INDEX_ITTRIGIN] = QCOM_ITTRIGIN,[INDEX_ITCTRL] = CORESIGHT_ITCTRL,+};
I saw CTI registers are within 4KiB (0x1000), we can don't convert standard regiserts and only convert to QCOM register based on the standard ones. So you can drop the cti_normal_offset strucuture and only have a cti_reg_qcom_offset[] struct:
static const u32 cti_extended_offset[] = { [CTIINTACK] = QCOM_CTIINTACK, [CTIAPPSET] = QCOM_CTIAPPSET, [CTIAPPCLEAR] = QCOM_CTIAPPCLEAR, [CTIAPPPULSE] = QCOM_CTIAPPPULSE, [CTIINEN] = QCOM_CTIINEN, ... };
I suggested the dual offset approach a couple of patchset revisions ago as it actually simplifies the code & makes it more efficient. The offset array in use is set during probe and the remaining code is then common to both without lots of "if qcom else " occurences.
Regards
Mike
Then you could create two helpers for register address:
static void __iomem *cti_reg_addr_with_nr(struct cti_drvdata *drvdata, u32 reg, u32 nr) { /* convert to qcom specific offset */ if (unlikely(drvdata->is_qcom_cti)) reg = cti_extended_offset[reg]; return drvdata->base + reg + sizeof(u32) * nr; } static void __iomem *cti_reg_addr(struct cti_drvdata *drvdata, u32 reg) { return cti_reg_addr_with_nr(drvdata, reg, 0); }/*
- CTI devices can be associated with a PE, or be connected to CoreSight
@@ -70,15 +119,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
/* write the CTI trigger registers */ for (i = 0; i < config->nr_trig_max; i++) {
writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
writel_relaxed(config->ctiinen[i],drvdata->base + cti_offset(drvdata, INDEX_CTIINEN, i));writel_relaxed(config->ctiinen[i], cti_reg_addr_with_nr(drvdata, CTIINENn, i));
And apply for the same cases below.
/* other regs */
writel_relaxed(config->ctigate, drvdata->base + CTIGATE);writel_relaxed(config->asicctl, drvdata->base + ASICCTL);writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
writel_relaxed(config->ctigate, drvdata->base + cti_offset(drvdata, INDEX_CTIGATE, 0));writel_relaxed(config->ctigate, cti_reg_addr(drvdata, CTIGATE));
And apply for the same cases below.
[...]
@@ -394,8 +447,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
/* update the local register values */ chan_bitmask = BIT(channel_idx);
reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :CTIOUTEN(trigger_idx));
reg_offset = (direction == CTI_TRIG_IN ? cti_offset(drvdata, INDEX_CTIINEN, trigger_idx) :cti_offset(drvdata, INDEX_CTIOUTEN, trigger_idx));For readable, we can improve a bit with code alignment:
reg_offset = (direction == CTI_TRIG_IN) ? cti_reg_addr_with_nr(drvdata, CTIINENn, trigger_idx) : cti_reg_addr_with_nr(drvdata, CTIOUTENn, trigger_idx);
[...]
@@ -981,9 +1035,28 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id) drvdata->csdev_release = drvdata->csdev->dev.release; drvdata->csdev->dev.release = cti_device_release;
/* check architect value*/devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {drvdata->subtype = QCOM_CTI;drvdata->offsets = cti_extended_offset;As a result, we can only set the is_qcom_cti flag:
drvdata->is_qcom_cti = true;
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
} else {drvdata->subtype = ARM_STD_CTI;drvdata->offsets = cti_normal_offset;}/* all done - dec pm refcount */ pm_runtime_put(&adev->dev);
dev_info(&drvdata->csdev->dev, "CTI initialized\n");
dev_info(&drvdata->csdev->dev, "CTI initialized; subtype=%d\n", drvdata->subtype);dev_info(&drvdata->csdev->dev, "%s CTI initialized\n", drvdata->is_qcom_cti ? "QCOM" : "");
return 0;pm_release: diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c index a9df77215141..12a495382999 100644 --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c @@ -172,9 +172,8 @@ static struct attribute *coresight_cti_attrs[] = {
/* register based attributes */
-/* Read registers with power check only (no enable check). */ -static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+static ssize_t coresight_cti_mgmt_reg_show(struct device *dev,
struct device_attribute *attr, char *buf){ struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); @@ -189,6 +188,40 @@ static ssize_t coresight_cti_reg_show(struct device *dev, return sysfs_emit(buf, "0x%x\n", val); }
+/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);u32 idx, val = 0;pm_runtime_get_sync(dev->parent);raw_spin_lock(&drvdata->spinlock);idx = drvdata->config.ext_reg_sel;if (drvdata->config.hw_powered) {switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:val = readl_relaxed(drvdata->base +cti_offset(drvdata, cti_attr->off, idx));break;default:val = readl_relaxed(drvdata->base + cti_offset(drvdata, cti_attr->off, 0));break;}}raw_spin_unlock(&drvdata->spinlock);pm_runtime_put_sync(dev->parent);return sysfs_emit(buf, "0x%x\n", val);+}
/* Write registers with power check only (no enable check). */ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct device_attribute *attr, @@ -197,19 +230,39 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); unsigned long val = 0;
u32 idx; if (kstrtoul(buf, 0, &val)) return -EINVAL; pm_runtime_get_sync(dev->parent); raw_spin_lock(&drvdata->spinlock);
if (drvdata->config.hw_powered)cti_write_single_reg(drvdata, cti_attr->off, val);
idx = drvdata->config.ext_reg_sel;if (drvdata->config.hw_powered) {switch (cti_attr->off) {case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, idx), val);break;default:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, 0), val);break;}}For both coresight_cti_reg_show() and coresight_cti_reg_store(), can we always use "cti_attr->off" as the offset for regitser access? I mean we don't need the extra config.ext_reg_sel, eventually any register we can calculate a offset for it.
raw_spin_unlock(&drvdata->spinlock); pm_runtime_put_sync(dev->parent); return size;}
+#define coresight_cti_mgmt_reg(name, offset) \
(&((struct cs_off_attribute[]) { \{ \__ATTR(name, 0444, coresight_cti_mgmt_reg_show, NULL), \offset \} \})[0].attr.attr)#define coresight_cti_reg(name, offset) \ (&((struct cs_off_attribute[]) { \ { \ @@ -237,17 +290,17 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
/* coresight management registers */ static struct attribute *coresight_cti_mgmt_attrs[] = {
coresight_cti_reg(devaff0, CTIDEVAFF0),coresight_cti_reg(devaff1, CTIDEVAFF1),coresight_cti_reg(authstatus, CORESIGHT_AUTHSTATUS),coresight_cti_reg(devarch, CORESIGHT_DEVARCH),coresight_cti_reg(devid, CORESIGHT_DEVID),coresight_cti_reg(devtype, CORESIGHT_DEVTYPE),coresight_cti_reg(pidr0, CORESIGHT_PERIPHIDR0),coresight_cti_reg(pidr1, CORESIGHT_PERIPHIDR1),coresight_cti_reg(pidr2, CORESIGHT_PERIPHIDR2),coresight_cti_reg(pidr3, CORESIGHT_PERIPHIDR3),coresight_cti_reg(pidr4, CORESIGHT_PERIPHIDR4),
coresight_cti_mgmt_reg(devaff0, CTIDEVAFF0),coresight_cti_mgmt_reg(devaff1, CTIDEVAFF1),coresight_cti_mgmt_reg(authstatus, CORESIGHT_AUTHSTATUS),coresight_cti_mgmt_reg(devarch, CORESIGHT_DEVARCH),coresight_cti_mgmt_reg(devid, CORESIGHT_DEVID),coresight_cti_mgmt_reg(devtype, CORESIGHT_DEVTYPE),coresight_cti_mgmt_reg(pidr0, CORESIGHT_PERIPHIDR0),coresight_cti_mgmt_reg(pidr1, CORESIGHT_PERIPHIDR1),coresight_cti_mgmt_reg(pidr2, CORESIGHT_PERIPHIDR2),coresight_cti_mgmt_reg(pidr3, CORESIGHT_PERIPHIDR3),coresight_cti_mgmt_reg(pidr4, CORESIGHT_PERIPHIDR4),I don't see any benefit for updating from coresight_cti_reg() to coresight_cti_mgmt_reg(). If really want to do this, should remove the macro coresight_cti_reg()?
NULL,};
@@ -258,13 +311,15 @@ static struct attribute *coresight_cti_mgmt_attrs[] = {
- If inaccessible & pcached_val not NULL then show cached value.
*/ static ssize_t cti_reg32_show(struct device *dev, char *buf,
u32 *pcached_val, int reg_offset)
u32 *pcached_val, int index)We don't need to change anything for this. The passed "reg_offset" should be always a final offset, no matter for standard CTI or QCOM case, the driver directly uses the offset for register access.
[...]
+/*
- QCOM CTI supports up to 128 triggers, there are 6 registers need to be
- expanded to up to 4 instances, and ext_reg_sel can be used to indicate
- which one is in use.
- CTITRIGINSTATUS, CTITRIGOUTSTATUS,
- ITTRIGIN, ITTRIGOUT,
- ITTRIGINACK, ITTRIGOUTACK.
- */
+static ssize_t ext_reg_sel_show(struct device *dev,
struct device_attribute *attr,char *buf)+{
u32 val;struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);raw_spin_lock(&drvdata->spinlock);val = drvdata->config.ext_reg_sel;raw_spin_unlock(&drvdata->spinlock);return sprintf(buf, "%d\n", val);+}
+static ssize_t ext_reg_sel_store(struct device *dev,
struct device_attribute *attr,const char *buf, size_t size)+{
unsigned long val;struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);if (kstrtoul(buf, 0, &val))return -EINVAL;if (val > ((drvdata->config.nr_trig_max + 31) / 32 - 1))return -EINVAL;raw_spin_lock(&drvdata->spinlock);drvdata->config.ext_reg_sel = val;raw_spin_unlock(&drvdata->spinlock);return size;+}
As said, I don't think the trigger register is any different from other register access. So the existed APIs would be sufficient.
As a result, we don't need to add two above functions.
Thanks, Leo
Hi
On Wed, 3 Dec 2025 at 18:29, Leo Yan leo.yan@arm.com wrote:
On Tue, Dec 02, 2025 at 02:42:21PM +0800, Yingchao Deng wrote:
The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI. It allows a debugger to send to trigger events to a processor or to send a trigger event to one or more processors when a trigger event occurs on another processor on the same SoC, or even between SoCs. Qualcomm CTI implementation differs from the standard CTI in the following aspects:
- The number of supported triggers is extended to 128.
- Several register offsets differ from the CoreSight specification.
I apologize for my late review of this series. For easier maintenance later, I have several comments for register access.
[...]
+static const u32 cti_normal_offset[] = {
[INDEX_CTIINTACK] = CTIINTACK,[INDEX_CTIAPPSET] = CTIAPPSET,[INDEX_CTIAPPCLEAR] = CTIAPPCLEAR,[INDEX_CTIAPPPULSE] = CTIAPPPULSE,[INDEX_CTIINEN] = CTIINEN(0),[INDEX_CTIOUTEN] = CTIOUTEN(0),I prefer to update the these two macros to CTIINENn and CTIOUTENn, as later we will not use CTIINEN(n) and CTIOUTEN(n) anymore.
[INDEX_CTITRIGINSTATUS] = CTITRIGINSTATUS,[INDEX_CTITRIGOUTSTATUS] = CTITRIGOUTSTATUS,[INDEX_CTICHINSTATUS] = CTICHINSTATUS,[INDEX_CTICHOUTSTATUS] = CTICHOUTSTATUS,[INDEX_CTIGATE] = CTIGATE,[INDEX_ASICCTL] = ASICCTL,[INDEX_ITCHINACK] = ITCHINACK,[INDEX_ITTRIGINACK] = ITTRIGINACK,[INDEX_ITCHOUT] = ITCHOUT,[INDEX_ITTRIGOUT] = ITTRIGOUT,[INDEX_ITCHOUTACK] = ITCHOUTACK,[INDEX_ITTRIGOUTACK] = ITTRIGOUTACK,[INDEX_ITCHIN] = ITCHIN,[INDEX_ITTRIGIN] = ITTRIGIN,[INDEX_ITCTRL] = CORESIGHT_ITCTRL,+};
+static const u32 cti_extended_offset[] = {
[INDEX_CTIINTACK] = QCOM_CTIINTACK,[INDEX_CTIAPPSET] = QCOM_CTIAPPSET,[INDEX_CTIAPPCLEAR] = QCOM_CTIAPPCLEAR,[INDEX_CTIAPPPULSE] = QCOM_CTIAPPPULSE,[INDEX_CTIINEN] = QCOM_CTIINEN,[INDEX_CTIOUTEN] = QCOM_CTIOUTEN,[INDEX_CTITRIGINSTATUS] = QCOM_CTITRIGINSTATUS,[INDEX_CTITRIGOUTSTATUS] = QCOM_CTITRIGOUTSTATUS,[INDEX_CTICHINSTATUS] = QCOM_CTICHINSTATUS,[INDEX_CTICHOUTSTATUS] = QCOM_CTICHOUTSTATUS,[INDEX_CTIGATE] = QCOM_CTIGATE,[INDEX_ASICCTL] = QCOM_ASICCTL,[INDEX_ITCHINACK] = QCOM_ITCHINACK,[INDEX_ITTRIGINACK] = QCOM_ITTRIGINACK,[INDEX_ITCHOUT] = QCOM_ITCHOUT,[INDEX_ITTRIGOUT] = QCOM_ITTRIGOUT,[INDEX_ITCHOUTACK] = QCOM_ITCHOUTACK,[INDEX_ITTRIGOUTACK] = QCOM_ITTRIGOUTACK,[INDEX_ITCHIN] = QCOM_ITCHIN,[INDEX_ITTRIGIN] = QCOM_ITTRIGIN,[INDEX_ITCTRL] = CORESIGHT_ITCTRL,+};
I saw CTI registers are within 4KiB (0x1000), we can don't convert standard regiserts and only convert to QCOM register based on the standard ones. So you can drop the cti_normal_offset strucuture and only have a cti_reg_qcom_offset[] struct:
static const u32 cti_extended_offset[] = { [CTIINTACK] = QCOM_CTIINTACK, [CTIAPPSET] = QCOM_CTIAPPSET, [CTIAPPCLEAR] = QCOM_CTIAPPCLEAR, [CTIAPPPULSE] = QCOM_CTIAPPPULSE, [CTIINEN] = QCOM_CTIINEN, ... };
Then you could create two helpers for register address:
static void __iomem *cti_reg_addr_with_nr(struct cti_drvdata *drvdata, u32 reg, u32 nr) { /* convert to qcom specific offset */ if (unlikely(drvdata->is_qcom_cti)) reg = cti_extended_offset[reg]; return drvdata->base + reg + sizeof(u32) * nr; } static void __iomem *cti_reg_addr(struct cti_drvdata *drvdata, u32 reg) { return cti_reg_addr_with_nr(drvdata, reg, 0); }/*
- CTI devices can be associated with a PE, or be connected to CoreSight
@@ -70,15 +119,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
/* write the CTI trigger registers */ for (i = 0; i < config->nr_trig_max; i++) {
writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
writel_relaxed(config->ctiinen[i],drvdata->base + cti_offset(drvdata, INDEX_CTIINEN, i));writel_relaxed(config->ctiinen[i], cti_reg_addr_with_nr(drvdata, CTIINENn, i));
And apply for the same cases below.
/* other regs */
writel_relaxed(config->ctigate, drvdata->base + CTIGATE);writel_relaxed(config->asicctl, drvdata->base + ASICCTL);writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
writel_relaxed(config->ctigate, drvdata->base + cti_offset(drvdata, INDEX_CTIGATE, 0));writel_relaxed(config->ctigate, cti_reg_addr(drvdata, CTIGATE));
And apply for the same cases below.
[...]
@@ -394,8 +447,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
/* update the local register values */ chan_bitmask = BIT(channel_idx);
reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :CTIOUTEN(trigger_idx));
reg_offset = (direction == CTI_TRIG_IN ? cti_offset(drvdata, INDEX_CTIINEN, trigger_idx) :cti_offset(drvdata, INDEX_CTIOUTEN, trigger_idx));For readable, we can improve a bit with code alignment:
reg_offset = (direction == CTI_TRIG_IN) ? cti_reg_addr_with_nr(drvdata, CTIINENn, trigger_idx) : cti_reg_addr_with_nr(drvdata, CTIOUTENn, trigger_idx);
[...]
@@ -981,9 +1035,28 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id) drvdata->csdev_release = drvdata->csdev->dev.release; drvdata->csdev->dev.release = cti_device_release;
/* check architect value*/devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {drvdata->subtype = QCOM_CTI;drvdata->offsets = cti_extended_offset;As a result, we can only set the is_qcom_cti flag:
drvdata->is_qcom_cti = true;
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
The patch I have created to handle systems without correct claim tag operation is a dependency for this patch set. Thus no need for override here as the core code will handle this correctly.
The only issue is ensuring the non-CTI spec implementation will result in the correct detection of no claim tags present.
Regards
Miike
} else {drvdata->subtype = ARM_STD_CTI;drvdata->offsets = cti_normal_offset;}/* all done - dec pm refcount */ pm_runtime_put(&adev->dev);
dev_info(&drvdata->csdev->dev, "CTI initialized\n");
dev_info(&drvdata->csdev->dev, "CTI initialized; subtype=%d\n", drvdata->subtype);dev_info(&drvdata->csdev->dev, "%s CTI initialized\n", drvdata->is_qcom_cti ? "QCOM" : "");
return 0;pm_release: diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c index a9df77215141..12a495382999 100644 --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c @@ -172,9 +172,8 @@ static struct attribute *coresight_cti_attrs[] = {
/* register based attributes */
-/* Read registers with power check only (no enable check). */ -static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+static ssize_t coresight_cti_mgmt_reg_show(struct device *dev,
struct device_attribute *attr, char *buf){ struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); @@ -189,6 +188,40 @@ static ssize_t coresight_cti_reg_show(struct device *dev, return sysfs_emit(buf, "0x%x\n", val); }
+/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);u32 idx, val = 0;pm_runtime_get_sync(dev->parent);raw_spin_lock(&drvdata->spinlock);idx = drvdata->config.ext_reg_sel;if (drvdata->config.hw_powered) {switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:val = readl_relaxed(drvdata->base +cti_offset(drvdata, cti_attr->off, idx));break;default:val = readl_relaxed(drvdata->base + cti_offset(drvdata, cti_attr->off, 0));break;}}raw_spin_unlock(&drvdata->spinlock);pm_runtime_put_sync(dev->parent);return sysfs_emit(buf, "0x%x\n", val);+}
/* Write registers with power check only (no enable check). */ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct device_attribute *attr, @@ -197,19 +230,39 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); unsigned long val = 0;
u32 idx; if (kstrtoul(buf, 0, &val)) return -EINVAL; pm_runtime_get_sync(dev->parent); raw_spin_lock(&drvdata->spinlock);
if (drvdata->config.hw_powered)cti_write_single_reg(drvdata, cti_attr->off, val);
idx = drvdata->config.ext_reg_sel;if (drvdata->config.hw_powered) {switch (cti_attr->off) {case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, idx), val);break;default:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, 0), val);break;}}For both coresight_cti_reg_show() and coresight_cti_reg_store(), can we always use "cti_attr->off" as the offset for regitser access? I mean we don't need the extra config.ext_reg_sel, eventually any register we can calculate a offset for it.
raw_spin_unlock(&drvdata->spinlock); pm_runtime_put_sync(dev->parent); return size;}
+#define coresight_cti_mgmt_reg(name, offset) \
(&((struct cs_off_attribute[]) { \{ \__ATTR(name, 0444, coresight_cti_mgmt_reg_show, NULL), \offset \} \})[0].attr.attr)#define coresight_cti_reg(name, offset) \ (&((struct cs_off_attribute[]) { \ { \ @@ -237,17 +290,17 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
/* coresight management registers */ static struct attribute *coresight_cti_mgmt_attrs[] = {
coresight_cti_reg(devaff0, CTIDEVAFF0),coresight_cti_reg(devaff1, CTIDEVAFF1),coresight_cti_reg(authstatus, CORESIGHT_AUTHSTATUS),coresight_cti_reg(devarch, CORESIGHT_DEVARCH),coresight_cti_reg(devid, CORESIGHT_DEVID),coresight_cti_reg(devtype, CORESIGHT_DEVTYPE),coresight_cti_reg(pidr0, CORESIGHT_PERIPHIDR0),coresight_cti_reg(pidr1, CORESIGHT_PERIPHIDR1),coresight_cti_reg(pidr2, CORESIGHT_PERIPHIDR2),coresight_cti_reg(pidr3, CORESIGHT_PERIPHIDR3),coresight_cti_reg(pidr4, CORESIGHT_PERIPHIDR4),
coresight_cti_mgmt_reg(devaff0, CTIDEVAFF0),coresight_cti_mgmt_reg(devaff1, CTIDEVAFF1),coresight_cti_mgmt_reg(authstatus, CORESIGHT_AUTHSTATUS),coresight_cti_mgmt_reg(devarch, CORESIGHT_DEVARCH),coresight_cti_mgmt_reg(devid, CORESIGHT_DEVID),coresight_cti_mgmt_reg(devtype, CORESIGHT_DEVTYPE),coresight_cti_mgmt_reg(pidr0, CORESIGHT_PERIPHIDR0),coresight_cti_mgmt_reg(pidr1, CORESIGHT_PERIPHIDR1),coresight_cti_mgmt_reg(pidr2, CORESIGHT_PERIPHIDR2),coresight_cti_mgmt_reg(pidr3, CORESIGHT_PERIPHIDR3),coresight_cti_mgmt_reg(pidr4, CORESIGHT_PERIPHIDR4),I don't see any benefit for updating from coresight_cti_reg() to coresight_cti_mgmt_reg(). If really want to do this, should remove the macro coresight_cti_reg()?
NULL,};
@@ -258,13 +311,15 @@ static struct attribute *coresight_cti_mgmt_attrs[] = {
- If inaccessible & pcached_val not NULL then show cached value.
*/ static ssize_t cti_reg32_show(struct device *dev, char *buf,
u32 *pcached_val, int reg_offset)
u32 *pcached_val, int index)We don't need to change anything for this. The passed "reg_offset" should be always a final offset, no matter for standard CTI or QCOM case, the driver directly uses the offset for register access.
[...]
+/*
- QCOM CTI supports up to 128 triggers, there are 6 registers need to be
- expanded to up to 4 instances, and ext_reg_sel can be used to indicate
- which one is in use.
- CTITRIGINSTATUS, CTITRIGOUTSTATUS,
- ITTRIGIN, ITTRIGOUT,
- ITTRIGINACK, ITTRIGOUTACK.
- */
+static ssize_t ext_reg_sel_show(struct device *dev,
struct device_attribute *attr,char *buf)+{
u32 val;struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);raw_spin_lock(&drvdata->spinlock);val = drvdata->config.ext_reg_sel;raw_spin_unlock(&drvdata->spinlock);return sprintf(buf, "%d\n", val);+}
+static ssize_t ext_reg_sel_store(struct device *dev,
struct device_attribute *attr,const char *buf, size_t size)+{
unsigned long val;struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);if (kstrtoul(buf, 0, &val))return -EINVAL;if (val > ((drvdata->config.nr_trig_max + 31) / 32 - 1))return -EINVAL;raw_spin_lock(&drvdata->spinlock);drvdata->config.ext_reg_sel = val;raw_spin_unlock(&drvdata->spinlock);return size;+}
As said, I don't think the trigger register is any different from other register access. So the existed APIs would be sufficient.
As a result, we don't need to add two above functions.
Thanks, Leo
On Thu, Dec 04, 2025 at 09:04:03AM +0000, Mike Leach wrote:
Hi,
On Thu, 4 Dec 2025 at 08:38, Leo Yan leo.yan@arm.com wrote:
On Wed, Dec 03, 2025 at 06:29:44PM +0000, Coresight ML wrote:
[...]
+/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
- u32 idx, val = 0;
- pm_runtime_get_sync(dev->parent);
- raw_spin_lock(&drvdata->spinlock);
- idx = drvdata->config.ext_reg_sel;
- if (drvdata->config.hw_powered) {
switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:I read again and now I understand why you need "config.ext_reg_sel" as an index for these expending registers.
Having this index for these extended registers matches what we do for the INEN and OUTEN registers. This gives the user a consistent approach. We do not want the unnecessary attributes as it will increase the memory footprint for all cti instances, not just the qcom ones.
I agree with using index for CTI triggers, but it is not necessary to add a new index for other registers (status, mode setting, ACK, etc).
It would be directive to present the status and mode setting registers, given these registers are only from 0-3. This will be easy accessed from userspace, and avoid complexity in the driver.
The first patch in this series works to reduce the memory footprint by only allocating resource based on the actual configuration. For example for an ARM designed CTI with 8 trigger registers, we no longer declare static 128 x 32 bit arrays for each of INEN and OUTEN which were required by the original design.
Given that there can be 10s or 100s of CTIs in a large multicore system, reducing the footprint to match the actual configuration, and offering a level of compression by using an index + single file to access a set of registers improves the efficiency of the driver.
It is good for reducing footprint, but I would give priority for a neat implementation and easy use interfaces.
And the sysfs attr code and global structures (e.g. register conversion struct) can shared by all instances, so I don't worry much the scale issue if we extend them.
Thanks, Leo
On Thu, Dec 04, 2025 at 09:07:56AM +0000, Mike Leach wrote:
[...]
I saw CTI registers are within 4KiB (0x1000), we can don't convert standard regiserts and only convert to QCOM register based on the standard ones. So you can drop the cti_normal_offset strucuture and only have a cti_reg_qcom_offset[] struct:
static const u32 cti_extended_offset[] = { [CTIINTACK] = QCOM_CTIINTACK, [CTIAPPSET] = QCOM_CTIAPPSET, [CTIAPPCLEAR] = QCOM_CTIAPPCLEAR, [CTIAPPPULSE] = QCOM_CTIAPPPULSE, [CTIINEN] = QCOM_CTIINEN, ... };
I suggested the dual offset approach a couple of patchset revisions ago as it actually simplifies the code & makes it more efficient. The offset array in use is set during probe and the remaining code is then common to both without lots of "if qcom else " occurences.
AFAICS, we will handle the QCOM CTI particularly in three cases:
1) The register access; 2) The claim tag; 3) Sysfs attr is visible.
Now we are discussing the reigster access. As suggested, the "if qcom / else" is encapsulated (e.g., in cti_reg_addr_with_nr()), it will not spread out.
I'd use standard registers by default and convert to non-standard ones only when needed. A new "neutral" index layer seems redundant, as the existing standard register indexes already serve this purpose.
For the sysfs attrs, it makes sense to use a central place to decide which knobs are only visible for QCOM CTI, otherwise, we also will not spread the condition check.
I will reply separately for claim tag issue.
Thanks, Leo
On Thu, Dec 04, 2025 at 09:15:07AM +0000, Mike Leach wrote:
[...]
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
The patch I have created to handle systems without correct claim tag operation is a dependency for this patch set. Thus no need for override here as the core code will handle this correctly.
The only issue is ensuring the non-CTI spec implementation will result in the correct detection of no claim tags present.
Your patch works only when a module has implemented claim registers.
This leads to two issues: we end up clearing an unknown register in the CTI driver, and then the coresight core layer assumes it is reading a claim register even though it is not.
For QCOM CTI, combined with your patch, I would suggest directly setting csdev->access.claim_tag_impl to false (perhaps using a helper). This would be much clearer than the "hacking" way.
Thanks, Leo
Hi Leo
On Thu, 4 Dec 2025 at 10:47, Leo Yan leo.yan@arm.com wrote:
On Thu, Dec 04, 2025 at 09:15:07AM +0000, Mike Leach wrote:
[...]
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
The patch I have created to handle systems without correct claim tag operation is a dependency for this patch set. Thus no need for override here as the core code will handle this correctly.
The only issue is ensuring the non-CTI spec implementation will result in the correct detection of no claim tags present.
Your patch works only when a module has implemented claim registers.
Per the Coresight spec - unimplemented registers must be RAZ/WI- so this still works for non implemented claim registers.
This leads to two issues: we end up clearing an unknown register in the CTI driver, and then the coresight core layer assumes it is reading a claim register even though it is not.
Again RAZ will simply read 0x0 - which is an indication that there are no claim bits implemented.
For QCOM CTI, combined with your patch, I would suggest directly setting csdev->access.claim_tag_impl to false (perhaps using a helper).
That would be a better solution, though as Qcom appear to have implemented a pair of standard RW registers rather than the claim tag functionality, the write solution works for this particular implementation.
Regards
Mike
This would be much clearer than the "hacking" way.
Thanks, Leo
Hi Leo
On Thu, 4 Dec 2025 at 10:31, Leo Yan leo.yan@arm.com wrote:
On Thu, Dec 04, 2025 at 09:07:56AM +0000, Mike Leach wrote:
[...]
I saw CTI registers are within 4KiB (0x1000), we can don't convert standard regiserts and only convert to QCOM register based on the standard ones. So you can drop the cti_normal_offset strucuture and only have a cti_reg_qcom_offset[] struct:
static const u32 cti_extended_offset[] = { [CTIINTACK] = QCOM_CTIINTACK, [CTIAPPSET] = QCOM_CTIAPPSET, [CTIAPPCLEAR] = QCOM_CTIAPPCLEAR, [CTIAPPPULSE] = QCOM_CTIAPPPULSE, [CTIINEN] = QCOM_CTIINEN, ... };
The tables in the patch are
[reg_type_array_index] = offset_address;
e.g.
[INDEX_CTIINTACK] = QCOM_CTIINTACK
which resolves to
[1] = 0x020
where index is constant for a given register type,
As far as I can tell what you have suggested above is a table that is
[std_addr_offset] = qcom_addr_offset;
e.g.
[CTIINTACK] = QCOM_CTIINTACK,
which resolves to
[0x10] = 0x020
which does not appear to work correctly?
The registers are sparsely spread across the memory map, so a simple mapping does not work, even if we divide the original offset by 4 to create a register number.
The largest standard offset we have is ITTRIGIN = 0xEF8, so assuming the compiler allows us to sparselly populate the array (which I think it does, along with some padding), we end up with an array of at least 0xEF8 elements, rather then the indexed 21?
Regards
Mike
I suggested the dual offset approach a couple of patchset revisions ago as it actually simplifies the code & makes it more efficient. The offset array in use is set during probe and the remaining code is then common to both without lots of "if qcom else " occurences.
AFAICS, we will handle the QCOM CTI particularly in three cases:
- The register access;
- The claim tag;
- Sysfs attr is visible.
Now we are discussing the reigster access. As suggested, the "if qcom / else" is encapsulated (e.g., in cti_reg_addr_with_nr()), it will not spread out.
I'd use standard registers by default and convert to non-standard ones only when needed. A new "neutral" index layer seems redundant, as the existing standard register indexes already serve this purpose.
For the sysfs attrs, it makes sense to use a central place to decide which knobs are only visible for QCOM CTI, otherwise, we also will not spread the condition check.
I will reply separately for claim tag issue.
Thanks, Leo
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
Hi Mike,
On Thu, Dec 04, 2025 at 04:17:35PM +0000, Mike Leach wrote:
[...]
The tables in the patch are
[reg_type_array_index] = offset_address;e.g.
[INDEX_CTIINTACK] = QCOM_CTIINTACK
which resolves to
[1] = 0x020
where index is constant for a given register type,
As far as I can tell what you have suggested above is a table that is
[std_addr_offset] = qcom_addr_offset;
e.g.
[CTIINTACK] = QCOM_CTIINTACK,
which resolves to
[0x10] = 0x020
which does not appear to work correctly?
The registers are sparsely spread across the memory map, so a simple mapping does not work, even if we divide the original offset by 4 to create a register number.
This should work. Though the array is not filled for each item, but it will return back 0x20 when we access array[0x10], I don't see problem here.
The largest standard offset we have is ITTRIGIN = 0xEF8, so assuming the compiler allows us to sparselly populate the array (which I think it does, along with some padding), we end up with an array of at least 0xEF8 elements, rather then the indexed 21?
I tested locally and did not see the GCC complaint for this approach. And this is a global structure with about 16KiB (~4K items x sizeof(u32)), we don't need to worry about scaling issue as it is shared by device instances.
If you dislike this way, then a static function also can fulfill the same task, something like:
static noinline u32 cti_qcom_reg_off(u32 offset) { switch (offset) { CTIINTACK: return QCOM_CTIINTACK; CTIAPPSET: return QCOM_CTIAPPSET; ... default: WARN(1, "Unknown offset=%u\n", offset); return 0; }
/* Should not run here, just for compiling */ return 0; }
Thanks, Leo
On Thu, Dec 04, 2025 at 03:07:10PM +0000, Mike Leach wrote:
[...]
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
The patch I have created to handle systems without correct claim tag operation is a dependency for this patch set. Thus no need for override here as the core code will handle this correctly.
The only issue is ensuring the non-CTI spec implementation will result in the correct detection of no claim tags present.
Your patch works only when a module has implemented claim registers.
Per the Coresight spec - unimplemented registers must be RAZ/WI- so this still works for non implemented claim registers.
QCOM CTI does not follow the spec in two aspects:
- Given the patch's comment: "QCOM CTI does not implement Claim tag functionality as per CoreSight specification", I am suspect the CLAIM registers are not implemented at all in QCOM CTI.
- It neither follows up the "unimplemented registers must be RAZ/WI" - the patch says its reset value is 0xF and then even can write 0 to it.
This leads to two issues: we end up clearing an unknown register in the CTI driver, and then the coresight core layer assumes it is reading a claim register even though it is not.
Again RAZ will simply read 0x0 - which is an indication that there are no claim bits implemented.
For QCOM CTI, combined with your patch, I would suggest directly setting csdev->access.claim_tag_impl to false (perhaps using a helper).
That would be a better solution, though as Qcom appear to have implemented a pair of standard RW registers rather than the claim tag functionality, the write solution works for this particular implementation.
If an IP violates both the rules for implemented claim registers and the rules for non-implemented claim registers, how can we rely on these registers to detect the claim feature?
My feeling is we are building a house on sand - these registers are not used for claim tags purpose at all.
Thanks, Leo
Hi Leo,
On Fri, 5 Dec 2025 at 10:27, Leo Yan leo.yan@arm.com wrote:
On Thu, Dec 04, 2025 at 03:07:10PM +0000, Mike Leach wrote:
[...]
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
The patch I have created to handle systems without correct claim tag operation is a dependency for this patch set. Thus no need for override here as the core code will handle this correctly.
The only issue is ensuring the non-CTI spec implementation will result in the correct detection of no claim tags present.
Your patch works only when a module has implemented claim registers.
Per the Coresight spec - unimplemented registers must be RAZ/WI- so this still works for non implemented claim registers.
QCOM CTI does not follow the spec in two aspects:
Given the patch's comment: "QCOM CTI does not implement Claim tag functionality as per CoreSight specification", I am suspect the CLAIM registers are not implemented at all in QCOM CTI.
It neither follows up the "unimplemented registers must be RAZ/WI" - the patch says its reset value is 0xF and then even can write 0 to it.
Correct - but my point is that the patch to handle lack of claim tags does work for both existent claim tags and unimplemented ones, where the spec is followed.
This leads to two issues: we end up clearing an unknown register in the CTI driver, and then the coresight core layer assumes it is reading a claim register even though it is not.
Again RAZ will simply read 0x0 - which is an indication that there are no claim bits implemented.
For QCOM CTI, combined with your patch, I would suggest directly setting csdev->access.claim_tag_impl to false (perhaps using a helper).
That would be a better solution, though as Qcom appear to have implemented a pair of standard RW registers rather than the claim tag functionality, the write solution works for this particular implementation.
If an IP violates both the rules for implemented claim registers and the rules for non-implemented claim registers, how can we rely on these registers to detect the claim feature?
My feeling is we are building a house on sand - these registers are not used for claim tags purpose at all.
No they are not, but by either writing the claim tag register with 0, or by setting the csdev->access.claim_tag_impl to false, this particular incorrect implementation instance can be made to work with the existing claim tag code.
There are effectively two issues here:-
1) The existing core code does not correctly handle insufficient claim tags or none existent claim tag register - per the coresight spec. This needs fixing for all potential components - not just this one. My claim tag patch does that.
2) This particular QCom CTI does not implement claim tags correctly. This particular instance can be made to operate correctly as detailed above as long as the core code can handle none-existent claim tags. The alternative is to bracket every claim tag access call with an "if !qcom_cti" statement. (which is what was happening in earlier patches).
We need to fix the core code anyway, and the specific QCom CTI can then be fixed in a single location without changes elsewhere in the code. This is the cleanest and least disruptive solution.
Regards
Mike
Thanks, Leo
Hi Leo
On Fri, 5 Dec 2025 at 10:04, Leo Yan leo.yan@arm.com wrote:
Hi Mike,
On Thu, Dec 04, 2025 at 04:17:35PM +0000, Mike Leach wrote:
[...]
The tables in the patch are
[reg_type_array_index] = offset_address;e.g.
[INDEX_CTIINTACK] = QCOM_CTIINTACK
which resolves to
[1] = 0x020
where index is constant for a given register type,
As far as I can tell what you have suggested above is a table that is
[std_addr_offset] = qcom_addr_offset;
e.g.
[CTIINTACK] = QCOM_CTIINTACK,
which resolves to
[0x10] = 0x020
which does not appear to work correctly?
Sorry - what I mean here is the contiguous array that appears to be in the source, does not match the reality when compiled into memory - not that it doesn't work programmatically.
The registers are sparsely spread across the memory map, so a simple mapping does not work, even if we divide the original offset by 4 to create a register number.
This should work. Though the array is not filled for each item, but it will return back 0x20 when we access array[0x10], I don't see problem here.
The largest standard offset we have is ITTRIGIN = 0xEF8, so assuming the compiler allows us to sparselly populate the array (which I think it does, along with some padding), we end up with an array of at least 0xEF8 elements, rather then the indexed 21?
I tested locally and did not see the GCC complaint for this approach. And this is a global structure with about 16KiB (~4K items x
Which is precisely the issue - why use 16k bytes of space when a pair of indexed tables will use 21 x 32bit locations per table -> 168 bytes - 100x smaller!
This space matters little to high end server systems but is much more important in smaller embedded systems.
Moreover the table + inline helper is more efficient at extracting the correct offset value. The helper is a simple de-reference - whereas the helper functions you suggest require the code to make the comparison at every register access. The "if qcom ..." may be contained in one place in the source code, but is called and executed for every access.
Why add inefficiencies, either in footprint or execution?
Regards
Mike
sizeof(u32)), we don't need to worry about scaling issue as it is shared by device instances.
If you dislike this way, then a static function also can fulfill the same task, something like:
static noinline u32 cti_qcom_reg_off(u32 offset) { switch (offset) { CTIINTACK: return QCOM_CTIINTACK; CTIAPPSET: return QCOM_CTIAPPSET; ... default: WARN(1, "Unknown offset=%u\n", offset); return 0; } /* Should not run here, just for compiling */ return 0; }Thanks, Leo
Hi Mike,
On Mon, Dec 08, 2025 at 02:47:21PM +0000, Mike Leach wrote:
[...]
I tested locally and did not see the GCC complaint for this approach. And this is a global structure with about 16KiB (~4K items x
Which is precisely the issue - why use 16k bytes of space when a pair of indexed tables will use 21 x 32bit locations per table -> 168 bytes
- 100x smaller!
This space matters little to high end server systems but is much more important in smaller embedded systems.
For the concern of performance and footprint, my approach can avoid any conversion for standard registers, we end up need to convert registers for non-standard registers anyway.
I understand your concern for using an array for conversion, this is cost 16KiB memory but this can benefit a bit performance. It is a trade-off between memory and speed. As said, we can use a static function for register conversion, the side effect is this might cause more time.
Given the CTI MMIO register access, I don't think an extra branch instruction (checking the flag) would cause significant panelty, given the flag is set once at init and never changed afterwards.
Moreover the table + inline helper is more efficient at extracting the correct offset value. The helper is a simple de-reference - whereas the helper functions you suggest require the code to make the comparison at every register access. The "if qcom ..." may be contained in one place in the source code, but is called and executed for every access.
Why add inefficiencies, either in footprint or execution?
This is about how we design a driver that supports both a standard IP and non-standard implementations.
Because the standard IP is well defined, its register layout should be the default; it keeps the code simple and makes future CTI extensions easier. For non-standard IPs, we only apply the register translations needed.
TBH, the optimization topic is a bit over design for me now. The CTI module is configured once and remains untouched until it is disabled, so it is not a hot path.
Thanks, Leo