Synchronous Ethernet networks use a physical layer clock to syntonize the frequency across different network elements.
Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet Equipment Clock (EEC) and have the ability to recover synchronization from the synchronization inputs - either traffic interfaces or external frequency sources. The EEC can synchronize its frequency (syntonize) to any of those sources. It is also able to select synchronization source through priority tables and synchronization status messaging. It also provides neccessary filtering and holdover capabilities
This patch series introduces basic interface for reading the Ethernet Equipment Clock (EEC) state on a SyncE capable device. This state gives information about the source of the syntonization signal (ether my port, or any external one) and the state of EEC. This interface is required\ to implement Synchronization Status Messaging on upper layers.
Next steps: - add interface to enable source clocks and get information about them - properly return the EEC_SRC_PORT flag depending on the port recovered clock being enabled and locked
v2: - removed whitespace changes - fix issues reported by test robot v3: - Changed naming from SyncE to EEC - Clarify cover letter and commit message for patch 1 v4: - Removed sync_source and pin_idx info - Changed one structure to attributes - Added EEC_SRC_PORT flag to indicate that the EEC is synchronized to the recovered clock of a port that returns the state
Maciej Machnikowski (2): rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status ice: add support for reading SyncE DPLL state
drivers/net/ethernet/intel/ice/ice.h | 5 ++ .../net/ethernet/intel/ice/ice_adminq_cmd.h | 34 +++++++++ drivers/net/ethernet/intel/ice/ice_common.c | 62 ++++++++++++++++ drivers/net/ethernet/intel/ice/ice_common.h | 4 ++ drivers/net/ethernet/intel/ice/ice_devids.h | 3 + drivers/net/ethernet/intel/ice/ice_main.c | 29 ++++++++ drivers/net/ethernet/intel/ice/ice_ptp.c | 35 +++++++++ drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 44 ++++++++++++ drivers/net/ethernet/intel/ice/ice_ptp_hw.h | 22 ++++++ include/linux/netdevice.h | 6 ++ include/uapi/linux/if_link.h | 31 ++++++++ include/uapi/linux/rtnetlink.h | 3 + net/core/rtnetlink.c | 71 +++++++++++++++++++ security/selinux/nlmsgtab.c | 3 +- 14 files changed, 351 insertions(+), 1 deletion(-)
This patch series introduces basic interface for reading the Ethernet Equipment Clock (EEC) state on a SyncE capable device. This state gives information about the state of EEC. This interface is required to implement Synchronization Status Messaging on upper layers.
Initial implementation returns SyncE EEC state and flags attributes. The only flag currently implemented is the EEC_SRC_PORT. When it's set the EEC is synchronized to the recovered clock recovered from the current port.
SyncE EEC state read needs to be implemented as a ndo_get_eec_state function.
Signed-off-by: Maciej Machnikowski maciej.machnikowski@intel.com --- include/linux/netdevice.h | 6 +++ include/uapi/linux/if_link.h | 31 +++++++++++++++ include/uapi/linux/rtnetlink.h | 3 ++ net/core/rtnetlink.c | 71 ++++++++++++++++++++++++++++++++++ security/selinux/nlmsgtab.c | 3 +- 5 files changed, 113 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7c41593c1d6a..de78b8ed903c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1344,6 +1344,8 @@ struct netdev_net_notifier { * The caller must be under RCU read context. * int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path); * Get the forwarding path to reach the real device from the HW destination address + * int (*ndo_get_synce_state)(struct net_device *dev, struct if_eec_state_msg *state) + * Get state of physical layer frequency syntonization (SyncE) */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1563,6 +1565,10 @@ struct net_device_ops { struct net_device * (*ndo_get_peer_dev)(struct net_device *dev); int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path); + int (*ndo_get_eec_state)(struct net_device *dev, + enum if_eec_state *state, + u32 *eec_flags, + struct netlink_ext_ack *extack); };
/** diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index eebd3894fe89..78a8a5af8cd8 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1273,4 +1273,35 @@ enum {
#define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1)
+/* SyncE section */ + +enum if_eec_state { + IF_EEC_STATE_INVALID = 0, + IF_EEC_STATE_FREERUN, + IF_EEC_STATE_LOCKACQ, + IF_EEC_STATE_LOCKREC, + IF_EEC_STATE_LOCKED, + IF_EEC_STATE_HOLDOVER, + IF_EEC_STATE_OPEN_LOOP, + __IF_EEC_STATE_MAX, +}; + +#define IF_EEC_STATE_MAX (__IF_EEC_STATE_MAX - 1) +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the port is + * currently the source for the EEC + */ + +struct if_eec_state_msg { + __u32 ifindex; +}; + +enum { + IFLA_EEC_UNSPEC, + IFLA_EEC_STATE, + IFLA_EEC_FLAGS, + __IFLA_EEC_MAX, +}; + +#define IFLA_EEC_MAX (__IFLA_EEC_MAX - 1) + #endif /* _UAPI_LINUX_IF_LINK_H */ diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 5888492a5257..642ac18a09d9 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -185,6 +185,9 @@ enum { RTM_GETNEXTHOPBUCKET, #define RTM_GETNEXTHOPBUCKET RTM_GETNEXTHOPBUCKET
+ RTM_GETEECSTATE = 120, +#define RTM_GETEECSTATE RTM_GETEECSTATE + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 972c8cb303a5..11d4e96d3371 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -5468,6 +5468,75 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; }
+static int rtnl_fill_eec_state(struct sk_buff *skb, struct net_device *dev, + u32 portid, u32 seq, struct netlink_callback *cb, + int flags, struct netlink_ext_ack *extack) +{ + const struct net_device_ops *ops = dev->netdev_ops; + struct if_eec_state_msg *state_msg; + enum if_eec_state state; + struct nlmsghdr *nlh; + u32 eec_flags; + int err; + + ASSERT_RTNL(); + + if (!ops->ndo_get_eec_state) + return -EOPNOTSUPP; + + err = ops->ndo_get_eec_state(dev, &state, &eec_flags, extack); + if (err) + return err; + + nlh = nlmsg_put(skb, portid, seq, RTM_GETEECSTATE, sizeof(*state_msg), + flags); + if (!nlh) + return -EMSGSIZE; + + state_msg = nlmsg_data(nlh); + state_msg->ifindex = dev->ifindex; + + if (nla_put(skb, IFLA_EEC_STATE, sizeof(state), &state) || + nla_put_u32(skb, IFLA_EEC_FLAGS, eec_flags)) + return -EMSGSIZE; + + nlmsg_end(skb, nlh); + return 0; +} + +static int rtnl_eec_state_get(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct net *net = sock_net(skb->sk); + struct if_eec_state_msg *state; + struct net_device *dev; + struct sk_buff *nskb; + int err; + + state = nlmsg_data(nlh); + if (state->ifindex > 0) + dev = __dev_get_by_index(net, state->ifindex); + else + return -EINVAL; + + if (!dev) + return -ENODEV; + + nskb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!nskb) + return -ENOBUFS; + + err = rtnl_fill_eec_state(nskb, dev, NETLINK_CB(skb).portid, + nlh->nlmsg_seq, NULL, nlh->nlmsg_flags, + extack); + if (err < 0) + kfree_skb(nskb); + else + err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid); + + return err; +} + /* Process one rtnetlink message. */
static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, @@ -5693,4 +5762,6 @@ void __init rtnetlink_init(void)
rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump, 0); + + rtnl_register(PF_UNSPEC, RTM_GETEECSTATE, rtnl_eec_state_get, NULL, 0); } diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c index d59276f48d4f..0e2d84d6045f 100644 --- a/security/selinux/nlmsgtab.c +++ b/security/selinux/nlmsgtab.c @@ -91,6 +91,7 @@ static const struct nlmsg_perm nlmsg_route_perms[] = { RTM_NEWNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_DELNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_GETNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_GETEECSTATE, NETLINK_ROUTE_SOCKET__NLMSG_READ }, };
static const struct nlmsg_perm nlmsg_tcpdiag_perms[] = @@ -174,7 +175,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm) * structures at the top of this file with the new mappings * before updating the BUILD_BUG_ON() macro! */ - BUILD_BUG_ON(RTM_MAX != (RTM_NEWNEXTHOPBUCKET + 3)); + BUILD_BUG_ON(RTM_MAX != (RTM_GETEECSTATE + 3)); err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms, sizeof(nlmsg_route_perms)); break;
On Fri, 3 Sep 2021 17:14:35 +0200 Maciej Machnikowski maciej.machnikowski@intel.com wrote:
This patch series introduces basic interface for reading the Ethernet Equipment Clock (EEC) state on a SyncE capable device. This state gives information about the state of EEC. This interface is required to implement Synchronization Status Messaging on upper layers.
Initial implementation returns SyncE EEC state and flags attributes. The only flag currently implemented is the EEC_SRC_PORT. When it's set the EEC is synchronized to the recovered clock recovered from the current port.
SyncE EEC state read needs to be implemented as a ndo_get_eec_state function.
Signed-off-by: Maciej Machnikowski maciej.machnikowski@intel.com
Is there a simpler way to do this? Seems like you are adding a lot for a use case specific to a small class of devices. For example adding a new network device operation adds small amount of bloat to every other network device in the kernel.
-----Original Message----- From: Stephen Hemminger stephen@networkplumber.org Sent: Friday, September 3, 2021 6:19 PM Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Fri, 3 Sep 2021 17:14:35 +0200 Maciej Machnikowski maciej.machnikowski@intel.com wrote:
This patch series introduces basic interface for reading the Ethernet Equipment Clock (EEC) state on a SyncE capable device. This state gives information about the state of EEC. This interface is required to implement Synchronization Status Messaging on upper layers.
Initial implementation returns SyncE EEC state and flags attributes. The only flag currently implemented is the EEC_SRC_PORT. When it's set the EEC is synchronized to the recovered clock recovered from the current port.
SyncE EEC state read needs to be implemented as a ndo_get_eec_state function.
Signed-off-by: Maciej Machnikowski maciej.machnikowski@intel.com
Is there a simpler way to do this? Seems like you are adding a lot for a use case specific to a small class of devices. For example adding a new network device operation adds small amount of bloat to every other network device in the kernel.
Hi!
I couldn't find any simpler way. Do you have something specific in mind? A function pointer is only U64. I can hardly think of anything smaller.
Regards Maciek
On Fri, 3 Sep 2021 17:14:35 +0200 Maciej Machnikowski wrote:
This patch series introduces basic interface for reading the Ethernet Equipment Clock (EEC) state on a SyncE capable device. This state gives information about the state of EEC. This interface is required to implement Synchronization Status Messaging on upper layers.
Initial implementation returns SyncE EEC state and flags attributes. The only flag currently implemented is the EEC_SRC_PORT. When it's set the EEC is synchronized to the recovered clock recovered from the current port.
SyncE EEC state read needs to be implemented as a ndo_get_eec_state function.
Signed-off-by: Maciej Machnikowski maciej.machnikowski@intel.com
Since we're talking SyncE-only now my intuition would be to put this op in ethtool. Was there a reason ethtool was not chosen? If not what do others think / if yes can the reason be included in the commit message?
+/* SyncE section */
+enum if_eec_state {
- IF_EEC_STATE_INVALID = 0,
- IF_EEC_STATE_FREERUN,
- IF_EEC_STATE_LOCKACQ,
- IF_EEC_STATE_LOCKREC,
- IF_EEC_STATE_LOCKED,
- IF_EEC_STATE_HOLDOVER,
- IF_EEC_STATE_OPEN_LOOP,
- __IF_EEC_STATE_MAX,
+};
+#define IF_EEC_STATE_MAX (__IF_EEC_STATE_MAX - 1)
You don't need MAC for an output-only enum, MAX define in netlink is usually for attributes to know how to size attribute arrays.
+#define EEC_SRC_PORT (1 << 0) /* recovered clock from the port is
* currently the source for the EEC
*/
Why include it then? Just leave the value out and if the attr is not present user space should assume the source is port.
+struct if_eec_state_msg {
- __u32 ifindex;
+};
+enum {
- IFLA_EEC_UNSPEC,
- IFLA_EEC_STATE,
- IFLA_EEC_FLAGS,
With "SRC_PORT" gone there's no reason for flags at this point.
- __IFLA_EEC_MAX,
+};
+#define IFLA_EEC_MAX (__IFLA_EEC_MAX - 1)
+static int rtnl_fill_eec_state(struct sk_buff *skb, struct net_device *dev,
u32 portid, u32 seq, struct netlink_callback *cb,
int flags, struct netlink_ext_ack *extack)
+{
- const struct net_device_ops *ops = dev->netdev_ops;
- struct if_eec_state_msg *state_msg;
- enum if_eec_state state;
- struct nlmsghdr *nlh;
- u32 eec_flags;
- int err;
- ASSERT_RTNL();
- if (!ops->ndo_get_eec_state)
return -EOPNOTSUPP;
- err = ops->ndo_get_eec_state(dev, &state, &eec_flags, extack);
- if (err)
return err;
- nlh = nlmsg_put(skb, portid, seq, RTM_GETEECSTATE, sizeof(*state_msg),
flags);
- if (!nlh)
return -EMSGSIZE;
- state_msg = nlmsg_data(nlh);
- state_msg->ifindex = dev->ifindex;
- if (nla_put(skb, IFLA_EEC_STATE, sizeof(state), &state) ||
This should be a u32.
nla_put_u32(skb, IFLA_EEC_FLAGS, eec_flags))
return -EMSGSIZE;
- nlmsg_end(skb, nlh);
- return 0;
+}
+static int rtnl_eec_state_get(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
+{
- struct net *net = sock_net(skb->sk);
- struct if_eec_state_msg *state;
- struct net_device *dev;
- struct sk_buff *nskb;
- int err;
- state = nlmsg_data(nlh);
- if (state->ifindex > 0)
dev = __dev_get_by_index(net, state->ifindex);
- else
return -EINVAL;
- if (!dev)
return -ENODEV;
I think I pointed this out already, this is more natural without the else branch:
if (ifindex <= 0) return -EINVAL;
dev = ... if (!dev) return -ENOENT;
or don't check the ifindex at all and let dev_get_by.. fail.
Thanks for pushing this API forward!
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Saturday, September 4, 2021 12:14 AM Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Fri, 3 Sep 2021 17:14:35 +0200 Maciej Machnikowski wrote:
This patch series introduces basic interface for reading the Ethernet Equipment Clock (EEC) state on a SyncE capable device. This state gives information about the state of EEC. This interface is required to implement Synchronization Status Messaging on upper layers.
Initial implementation returns SyncE EEC state and flags attributes. The only flag currently implemented is the EEC_SRC_PORT. When it's set the EEC is synchronized to the recovered clock recovered from the current port.
SyncE EEC state read needs to be implemented as a ndo_get_eec_state function.
Signed-off-by: Maciej Machnikowski maciej.machnikowski@intel.com
Since we're talking SyncE-only now my intuition would be to put this op in ethtool. Was there a reason ethtool was not chosen? If not what do others think / if yes can the reason be included in the commit message?
Hmm. Main reason for netlink is that linuxptp already supports it, and it was suggested by Richard. Having an NDO would also make it easier to add a SyncE-related files to the sysfs for easier operation (following the ideas from the ptp pins subsystem). But I'm open for suggestions.
+#define EEC_SRC_PORT (1 << 0) /* recovered clock from the
port is
* currently the source for the EEC
*/
Why include it then? Just leave the value out and if the attr is not present user space should assume the source is port.
This bit has a different meaning. If it's set the port in question is a frequency source for the multiport device, if it's cleared - some other source is used as a source. This is needed to prevent setting invalid configurations in the PHY (like setting the frequency source as a Master in AN) or sending invalid messages. If the port is a frequency source it must always send back QL-DNU messages to prevent synchronization loops.
or don't check the ifindex at all and let dev_get_by.. fail.
Thanks for pushing this API forward!
Addressed all other comments - and thanks for giving a lot of helpful suggestions!
Regards Maciek
On Mon, 6 Sep 2021 18:30:40 +0000 Machnikowski, Maciej wrote:
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Saturday, September 4, 2021 12:14 AM Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Fri, 3 Sep 2021 17:14:35 +0200 Maciej Machnikowski wrote:
This patch series introduces basic interface for reading the Ethernet Equipment Clock (EEC) state on a SyncE capable device. This state gives information about the state of EEC. This interface is required to implement Synchronization Status Messaging on upper layers.
Initial implementation returns SyncE EEC state and flags attributes. The only flag currently implemented is the EEC_SRC_PORT. When it's set the EEC is synchronized to the recovered clock recovered from the current port.
SyncE EEC state read needs to be implemented as a ndo_get_eec_state function.
Signed-off-by: Maciej Machnikowski maciej.machnikowski@intel.com
Since we're talking SyncE-only now my intuition would be to put this op in ethtool. Was there a reason ethtool was not chosen? If not what do others think / if yes can the reason be included in the commit message?
Hmm. Main reason for netlink is that linuxptp already supports it, and it was suggested by Richard. Having an NDO would also make it easier to add a SyncE-related files to the sysfs for easier operation (following the ideas from the ptp pins subsystem). But I'm open for suggestions.
I think linuxptp will need support for ethtool netlink sockets sooner rather than later. Moving this to ethtool makes sense to me since it's very much a Ethernet-oriented API at this point.
+#define EEC_SRC_PORT (1 << 0) /* recovered clock from the
port is
* currently the source for the EEC
*/
Why include it then? Just leave the value out and if the attr is not present user space should assume the source is port.
This bit has a different meaning. If it's set the port in question is a frequency source for the multiport device, if it's cleared - some other source is used as a source. This is needed to prevent setting invalid configurations in the PHY (like setting the frequency source as a Master in AN) or sending invalid messages. If the port is a frequency source it must always send back QL-DNU messages to prevent synchronization loops.
Ah! I see. Is being the "source" negotiated somehow? Don't we need to give the user / linuxptp to select the source based on whatever info it has about topology?
or don't check the ifindex at all and let dev_get_by.. fail.
Thanks for pushing this API forward!
Addressed all other comments - and thanks for giving a lot of helpful suggestions!
Thanks, BTW I think I forgot to ask for documentation, dumping info about the API and context under Documentation/ would be great!
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Monday, September 6, 2021 8:39 PM To: Machnikowski, Maciej maciej.machnikowski@intel.com Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Mon, 6 Sep 2021 18:30:40 +0000 Machnikowski, Maciej wrote:
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Saturday, September 4, 2021 12:14 AM Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Fri, 3 Sep 2021 17:14:35 +0200 Maciej Machnikowski wrote:
This patch series introduces basic interface for reading the Ethernet Equipment Clock (EEC) state on a SyncE capable device. This state gives information about the state of EEC. This interface is required to implement Synchronization Status Messaging on upper layers.
Initial implementation returns SyncE EEC state and flags attributes. The only flag currently implemented is the EEC_SRC_PORT. When it's
set
the EEC is synchronized to the recovered clock recovered from the current port.
SyncE EEC state read needs to be implemented as a ndo_get_eec_state function.
Signed-off-by: Maciej Machnikowski
Since we're talking SyncE-only now my intuition would be to put this op in ethtool. Was there a reason ethtool was not chosen? If not what do others think / if yes can the reason be included in the commit message?
Hmm. Main reason for netlink is that linuxptp already supports it, and it was suggested by Richard. Having an NDO would also make it easier to add a SyncE-related files to the sysfs for easier operation (following the ideas from the ptp pins subsystem). But I'm open for suggestions.
I think linuxptp will need support for ethtool netlink sockets sooner rather than later. Moving this to ethtool makes sense to me since it's very much a Ethernet-oriented API at this point.
Ethtool also makes a lot of sense, but will it be possible to still make sysfs, and it makes sense to add it for some deployments (more on that below)
+#define EEC_SRC_PORT (1 << 0) /* recovered clock from the
port is
* currently the source for the EEC
*/
Why include it then? Just leave the value out and if the attr is not present user space should assume the source is port.
This bit has a different meaning. If it's set the port in question is a frequency source for the multiport device, if it's cleared - some other source is used as a source. This is needed to prevent setting invalid configurations in the PHY (like setting the frequency source as a Master in AN) or sending invalid messages. If the port is a frequency source it must always send back QL-DNU messages to prevent synchronization loops.
Ah! I see. Is being the "source" negotiated somehow? Don't we need to give the user / linuxptp to select the source based on whatever info it has about topology?
The frequency source can be either pre-set statically, negotiated using ESMC QL-levels (if working in QL-Enabled mode), or follow automatic fallback inside the device. This flag gives feedback about the validity of recovered clock coming from a given port and is useful when you enable multiple recovered clocks on more than one port in active-passive model. In that case the "driving" port may change dynamically, so it's a good idea to have some interface to reflect that.
That's where sysfs file be useful. When I add the implementation for recovered clock configuration, the sysfs may be used as standalone interface for configuring them when no dynamic change is needed.
or don't check the ifindex at all and let dev_get_by.. fail.
Thanks for pushing this API forward!
Addressed all other comments - and thanks for giving a lot of helpful suggestions!
Thanks, BTW I think I forgot to ask for documentation, dumping info about the API and context under Documentation/ would be great!
Could you suggest where to add that? Grepping for ndo_ don't give much. I can add a new synce.rst file if it makes sense.
On Mon, 6 Sep 2021 19:01:54 +0000 Machnikowski, Maciej wrote:
Hmm. Main reason for netlink is that linuxptp already supports it, and it was suggested by Richard. Having an NDO would also make it easier to add a SyncE-related files to the sysfs for easier operation (following the ideas from the ptp pins subsystem). But I'm open for suggestions.
I think linuxptp will need support for ethtool netlink sockets sooner rather than later. Moving this to ethtool makes sense to me since it's very much a Ethernet-oriented API at this point.
Ethtool also makes a lot of sense, but will it be possible to still make sysfs, and it makes sense to add it for some deployments (more on that below)
It should not make much difference whether ndo or ethtool op is used.
This bit has a different meaning. If it's set the port in question is a frequency source for the multiport device, if it's cleared - some other source is used as a source. This is needed to prevent setting invalid configurations in the PHY (like setting the frequency source as a Master in AN) or sending invalid messages. If the port is a frequency source it must always send back QL-DNU messages to prevent synchronization loops.
Ah! I see. Is being the "source" negotiated somehow? Don't we need to give the user / linuxptp to select the source based on whatever info it has about topology?
The frequency source can be either pre-set statically, negotiated using ESMC QL-levels (if working in QL-Enabled mode), or follow automatic fallback inside the device. This flag gives feedback about the validity of recovered clock coming from a given port and is useful when you enable multiple recovered clocks on more than one port in active-passive model. In that case the "driving" port may change dynamically, so it's a good idea to have some interface to reflect that.
The ESMC messages are handled by Linux or some form of firmware? I don't see how you can implement any selection policy with a read-only API.
In general it would be more natural to place a "source id" at the DPLL/clock, the "source" flag seems to mark the wrong end of the relationship. If there ever are multiple consumers we won't be able to tell which "target" the "source" is referring to. Hard to judge how much of a problem that could be by looking at a small slice of the system.
That's where sysfs file be useful. When I add the implementation for recovered clock configuration, the sysfs may be used as standalone interface for configuring them when no dynamic change is needed.
I didn't get that. Do you mean using a sysfs file to configure the parameters of the DPLL?
If the DPLL has its own set of concerns we should go ahead and create explicit object / configuration channel for it.
Somehow I got it into my head that you care mostly about transmitting the clock, IOW recovering it from one port and using on another but that's probably not even a strong use case for you or NICs in general :S
Addressed all other comments - and thanks for giving a lot of helpful suggestions!
Thanks, BTW I think I forgot to ask for documentation, dumping info about the API and context under Documentation/ would be great!
Could you suggest where to add that? Grepping for ndo_ don't give much. I can add a new synce.rst file if it makes sense.
New networking/synce.rst file makes perfect sense to me. And perhaps link to it from driver-api/ptp.rst.
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Tuesday, September 7, 2021 3:01 AM To: Machnikowski, Maciej maciej.machnikowski@intel.com Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Mon, 6 Sep 2021 19:01:54 +0000 Machnikowski, Maciej wrote:
Hmm. Main reason for netlink is that linuxptp already supports it, and it was suggested by Richard. Having an NDO would also make it easier to add a SyncE-related files to the sysfs for easier operation (following the ideas from the ptp pins subsystem). But I'm open for suggestions.
I think linuxptp will need support for ethtool netlink sockets sooner rather than later. Moving this to ethtool makes sense to me since it's very much a Ethernet-oriented API at this point.
Ethtool also makes a lot of sense, but will it be possible to still make sysfs, and it makes sense to add it for some deployments (more on that below)
It should not make much difference whether ndo or ethtool op is used.
Will look into it then.
This bit has a different meaning. If it's set the port in question is a frequency source for the multiport device, if it's cleared - some
other
source is used as a source. This is needed to prevent setting invalid configurations in the PHY (like setting the frequency source as a Master in AN) or sending invalid messages. If the port is a frequency source it must always send back QL-DNU messages to prevent synchronization loops.
Ah! I see. Is being the "source" negotiated somehow? Don't we need to give the user / linuxptp to select the source based on whatever info it has about topology?
The frequency source can be either pre-set statically, negotiated using ESMC QL-levels (if working in QL-Enabled mode), or follow automatic fallback inside the device. This flag gives feedback about the validity of recovered clock coming from a given port and is useful when you enable multiple recovered clocks on more than one port in active-passive model. In that case the "driving" port may change dynamically, so it's a good idea to have some interface to reflect that.
The ESMC messages are handled by Linux or some form of firmware? I don't see how you can implement any selection policy with a read-only API.
It can be either in FW or in Linux - depending on the deployment. We try to define the API that would enable Linux to manage that. EEC state will be read-only, but the recovered clock management part will allow changes for QL-disabled SyncE deployments that only need to see if the clock they receive on a given port is valid or not.
In general it would be more natural to place a "source id" at the DPLL/clock, the "source" flag seems to mark the wrong end of the relationship. If there ever are multiple consumers we won't be able to tell which "target" the "source" is referring to. Hard to judge how much of a problem that could be by looking at a small slice of the system.
The DPLL will operate on pins, so it will have a pin connected from the MAC/PHY that will have the recovered clock, but the recovered clock can be enabled from any port/lane. That information is kept in the MAC/PHY and the DPLL side will not be aware who it belongs to.
We can come up with a better name, but think of it like: You have multiport device (switch/NIC). One port is recovering the clock, the PHY/MAC outputs that clock through the pin to the EEC (DPLL). The DPLL knows if it locked to the signal coming from the multiport PHY/MAC, but it doesn't know which port is the one that generates that clock signal. All other ports can also present the "locked" state, but they are following the clock that was received in the chosen port. If we drop this flag we won't be able to easily tell which port/lane drives the recovered clock. In short: the port with that flag on is following the network clock and leading clock of other ports of the multiport device.
In the most basic SyncE deployment you can put the passive DPLL that will only give you the lock/holdover/unlocked info and just use this flag to know who currently drives the DPLL.
That's where sysfs file be useful. When I add the implementation for recovered clock configuration, the sysfs may be used as standalone interface for configuring them when no dynamic change is needed.
I didn't get that. Do you mean using a sysfs file to configure the parameters of the DPLL?
Only the PHY/MAC side of thing which is recovered clock configuration and the ECC state.
If the DPLL has its own set of concerns we should go ahead and create explicit object / configuration channel for it.
Somehow I got it into my head that you care mostly about transmitting the clock, IOW recovering it from one port and using on another but that's probably not even a strong use case for you or NICs in general :S
This is the right thinking. The DPLL can also have different external sources, like the GNSS, and can also drive different output clocks. But for the most basic SyncE implementation, which only runs on a recovered clock, we won't need the DPLL subsystem.
Addressed all other comments - and thanks for giving a lot of helpful suggestions!
Thanks, BTW I think I forgot to ask for documentation, dumping info about the API and context under Documentation/ would be great!
Could you suggest where to add that? Grepping for ndo_ don't give much. I can add a new synce.rst file if it makes sense.
New networking/synce.rst file makes perfect sense to me. And perhaps link to it from driver-api/ptp.rst.
OK will try to come up with something there
On Tue, 7 Sep 2021 08:50:55 +0000 Machnikowski, Maciej wrote:
The frequency source can be either pre-set statically, negotiated using ESMC QL-levels (if working in QL-Enabled mode), or follow automatic fallback inside the device. This flag gives feedback about the validity of recovered clock coming from a given port and is useful when you enable multiple recovered clocks on more than one port in active-passive model. In that case the "driving" port may change dynamically, so it's a good idea to have some interface to reflect that.
The ESMC messages are handled by Linux or some form of firmware? I don't see how you can implement any selection policy with a read-only API.
It can be either in FW or in Linux - depending on the deployment. We try to define the API that would enable Linux to manage that.
We should implement the API for Linux to manage things from the get go.
EEC state will be read-only, but the recovered clock management part will allow changes for QL-disabled SyncE deployments that only need to see if the clock they receive on a given port is valid or not.
In general it would be more natural to place a "source id" at the DPLL/clock, the "source" flag seems to mark the wrong end of the relationship. If there ever are multiple consumers we won't be able to tell which "target" the "source" is referring to. Hard to judge how much of a problem that could be by looking at a small slice of the system.
The DPLL will operate on pins, so it will have a pin connected from the MAC/PHY that will have the recovered clock, but the recovered clock can be enabled from any port/lane. That information is kept in the MAC/PHY and the DPLL side will not be aware who it belongs to.
So the clock outputs are muxed to a single pin at the Ethernet IP level, in your design. I wonder if this is the common implementation and therefore if it's safe to bake that into the API. Input from other vendors would be great...
Also do I understand correctly that the output of the Ethernet IP is just the raw Rx clock once receiver is locked and the DPLL which enum if_synce_state refers to is in the time IP, that DPLL could be driven by GNSS etc?
We can come up with a better name, but think of it like: You have multiport device (switch/NIC). One port is recovering the clock, the PHY/MAC outputs that clock through the pin to the EEC (DPLL). The DPLL knows if it locked to the signal coming from the multiport PHY/MAC, but it doesn't know which port is the one that generates that clock signal. All other ports can also present the "locked" state, but they are following the clock that was received in the chosen port. If we drop this flag we won't be able to easily tell which port/lane drives the recovered clock. In short: the port with that flag on is following the network clock and leading clock of other ports of the multiport device.
In the most basic SyncE deployment you can put the passive DPLL that will only give you the lock/holdover/unlocked info and just use this flag to know who currently drives the DPLL.
That's where sysfs file be useful. When I add the implementation for recovered clock configuration, the sysfs may be used as standalone interface for configuring them when no dynamic change is needed.
I didn't get that. Do you mean using a sysfs file to configure the parameters of the DPLL?
Only the PHY/MAC side of thing which is recovered clock configuration and the ECC state.
If the DPLL has its own set of concerns we should go ahead and create explicit object / configuration channel for it.
Somehow I got it into my head that you care mostly about transmitting the clock, IOW recovering it from one port and using on another but that's probably not even a strong use case for you or NICs in general :S
This is the right thinking. The DPLL can also have different external sources, like the GNSS, and can also drive different output clocks. But for the most basic SyncE implementation, which only runs on a recovered clock, we won't need the DPLL subsystem.
The GNSS pulse would come in over an external pin, tho, right? Your earlier version of the patchset had GNSS as an enum value, how would the driver / FW know that a given pin means GNSS?
Could you suggest where to add that? Grepping for ndo_ don't give much. I can add a new synce.rst file if it makes sense.
New networking/synce.rst file makes perfect sense to me. And perhaps link to it from driver-api/ptp.rst.
OK will try to come up with something there
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Tuesday, September 7, 2021 4:55 PM To: Machnikowski, Maciej maciej.machnikowski@intel.com Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; richardcochran@gmail.com; abyagowi@fb.com; Nguyen, Anthony L anthony.l.nguyen@intel.com; davem@davemloft.net; linux- kselftest@vger.kernel.org; Andrew Lunn andrew@lunn.ch; Michal Kubecek mkubecek@suse.cz; Saeed Mahameed saeed@kernel.org; Michael Chan michael.chan@broadcom.com Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Tue, 7 Sep 2021 08:50:55 +0000 Machnikowski, Maciej wrote:
The frequency source can be either pre-set statically, negotiated using ESMC QL-levels (if working in QL-Enabled mode), or follow automatic fallback inside the device. This flag gives feedback about the validity of recovered clock coming from a given port and is useful when you enable multiple recovered clocks on more than one port in active-passive model. In that case the "driving" port may change dynamically, so it's a good idea to have some interface to reflect that.
The ESMC messages are handled by Linux or some form of firmware? I don't see how you can implement any selection policy with a read-only API.
It can be either in FW or in Linux - depending on the deployment. We try to define the API that would enable Linux to manage that.
We should implement the API for Linux to manage things from the get go.
Yep! Yet let's go one step at a time. I believe once we have the basics (EEC monitoring and recovered clock configuration) we'll be able to implement a basic functionality - and add bells and whistles later on, as there are more capabilities that we could support in SW.
EEC state will be read-only, but the recovered clock management part will allow changes for QL-disabled SyncE deployments that only need to see if the clock they receive on a given port is valid or not.
In general it would be more natural to place a "source id" at the DPLL/clock, the "source" flag seems to mark the wrong end of the relationship. If there ever are multiple consumers we won't be able to tell which "target" the "source" is referring to. Hard to judge how much of a problem that could be by looking at a small slice of the system.
The DPLL will operate on pins, so it will have a pin connected from the MAC/PHY that will have the recovered clock, but the recovered clock can be enabled from any port/lane. That information is kept in the MAC/PHY and the DPLL side will not be aware who it belongs to.
So the clock outputs are muxed to a single pin at the Ethernet IP level, in your design. I wonder if this is the common implementation and therefore if it's safe to bake that into the API. Input from other vendors would be great...
I believe this is the state-of-art: here's the Broadcom public one https://docs.broadcom.com/doc/1211168567832, I believe Marvel has similar solution. But would also be happy to hear others.
Also do I understand correctly that the output of the Ethernet IP is just the raw Rx clock once receiver is locked and the DPLL which enum if_synce_state refers to is in the time IP, that DPLL could be driven by GNSS etc?
Ethernet IP/PHY usually outputs a divided clock signal (since it's easier to route) derived from the RX clock. The DPLL connectivity is vendor-specific, as you can use it to connect some external signals, but you can as well just care about relying the SyncE clock and only allow recovering it and passing along the QL info when your EEC is locked. That's why I backed up from a full DPLL implementation in favor of a more generic EEC clock. The Time IP is again relative and vendor-specific. If SyncE is deployed alongside PTP it will most likely be tightly coupled, but if you only care about having a frequency source - it's not mandatory and it can be as well in the PHY IP.
Also I think I will strip the reported states to the bare minimum defined in the ITU-T G.781 instead of reusing the states that were already defined for a specific DPLL.
We can come up with a better name, but think of it like: You have multiport device (switch/NIC). One port is recovering the clock, the PHY/MAC outputs that clock through the pin to the EEC (DPLL). The DPLL knows if it locked to the signal coming from the multiport PHY/MAC, but it doesn't know which port is the one that generates that clock signal. All other ports can also present the "locked" state, but they are following the clock that was received in the chosen port. If we drop this flag we won't be able to easily tell which port/lane drives the recovered clock. In short: the port with that flag on is following the network clock and leading clock of other ports of the multiport device.
In the most basic SyncE deployment you can put the passive DPLL that will only give you the lock/holdover/unlocked info and just use this flag to know who currently drives the DPLL.
That's where sysfs file be useful. When I add the implementation for recovered clock configuration, the sysfs may be used as standalone interface for configuring them when no dynamic change is needed.
I didn't get that. Do you mean using a sysfs file to configure the parameters of the DPLL?
Only the PHY/MAC side of thing which is recovered clock configuration and the ECC state.
If the DPLL has its own set of concerns we should go ahead and create explicit object / configuration channel for it.
Somehow I got it into my head that you care mostly about transmitting the clock, IOW recovering it from one port and using on another but that's probably not even a strong use case for you or NICs in general :S
This is the right thinking. The DPLL can also have different external sources, like the GNSS, and can also drive different output clocks. But for the most basic SyncE implementation, which only runs on a recovered clock, we
won't
need the DPLL subsystem.
The GNSS pulse would come in over an external pin, tho, right? Your earlier version of the patchset had GNSS as an enum value, how would the driver / FW know that a given pin means GNSS?
The GNSS 1PPS will more likely go directly to the "full" DPLL. The pin topology can be derived from FW or any vendor-specific way of mapping pins to their sources. And, in "worst" case can just be hardcoded for a specific device.
Could you suggest where to add that? Grepping for ndo_ don't give
much.
I can add a new synce.rst file if it makes sense.
New networking/synce.rst file makes perfect sense to me. And perhaps link to it from driver-api/ptp.rst.
OK will try to come up with something there
On Tue, 7 Sep 2021 15:47:05 +0000 Machnikowski, Maciej wrote:
It can be either in FW or in Linux - depending on the deployment. We try to define the API that would enable Linux to manage that.
We should implement the API for Linux to manage things from the get go.
Yep! Yet let's go one step at a time. I believe once we have the basics (EEC monitoring and recovered clock configuration) we'll be able to implement a basic functionality - and add bells and whistles later on, as there are more capabilities that we could support in SW.
The set API may shape how the get API looks. We need a minimal viable API where the whole control part of it is not "firmware or proprietary tools take care of that".
Do you have public docs on how the whole solution works?
The DPLL will operate on pins, so it will have a pin connected from the MAC/PHY that will have the recovered clock, but the recovered clock can be enabled from any port/lane. That information is kept in the MAC/PHY and the DPLL side will not be aware who it belongs to.
So the clock outputs are muxed to a single pin at the Ethernet IP level, in your design. I wonder if this is the common implementation and therefore if it's safe to bake that into the API. Input from other vendors would be great...
I believe this is the state-of-art: here's the Broadcom public one https://docs.broadcom.com/doc/1211168567832, I believe Marvel has similar solution. But would also be happy to hear others.
Interesting. That reveals the need for also marking the backup (/secondary) clock.
Have you seen any docs on how systems with discreet PHY ASICs mux the clocks?
Also do I understand correctly that the output of the Ethernet IP is just the raw Rx clock once receiver is locked and the DPLL which enum if_synce_state refers to is in the time IP, that DPLL could be driven by GNSS etc?
Ethernet IP/PHY usually outputs a divided clock signal (since it's easier to route) derived from the RX clock. The DPLL connectivity is vendor-specific, as you can use it to connect some external signals, but you can as well just care about relying the SyncE clock and only allow recovering it and passing along the QL info when your EEC is locked. That's why I backed up from a full DPLL implementation in favor of a more generic EEC clock.
What is an ECC clock? To me the PLL state in the Ethernet port is the state of the recovered clock. enum if_eec_state has values like holdover which seem to be more applicable to the "system wide" PLL.
Let me ask this - if one port is training the link and the other one has the lock and is the source - what state will be reported for each port?
The Time IP is again relative and vendor-specific. If SyncE is deployed alongside PTP it will most likely be tightly coupled, but if you only care about having a frequency source - it's not mandatory and it can be as well in the PHY IP.
I would not think having just the freq is very useful.
Also I think I will strip the reported states to the bare minimum defined in the ITU-T G.781 instead of reusing the states that were already defined for a specific DPLL.
This is the right thinking. The DPLL can also have different external sources, like the GNSS, and can also drive different output clocks. But for the most basic SyncE implementation, which only runs on a recovered clock, we won't need the DPLL subsystem.
The GNSS pulse would come in over an external pin, tho, right? Your earlier version of the patchset had GNSS as an enum value, how would the driver / FW know that a given pin means GNSS?
The GNSS 1PPS will more likely go directly to the "full" DPLL. The pin topology can be derived from FW or any vendor-specific way of mapping pins to their sources. And, in "worst" case can just be hardcoded for a specific device.
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Tuesday, September 7, 2021 9:48 PM Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Tue, 7 Sep 2021 15:47:05 +0000 Machnikowski, Maciej wrote:
It can be either in FW or in Linux - depending on the deployment. We try to define the API that would enable Linux to manage that.
We should implement the API for Linux to manage things from the get
go.
Yep! Yet let's go one step at a time. I believe once we have the basics (EEC monitoring and recovered clock configuration) we'll be able to implement a basic functionality - and add bells and whistles later on, as there are more capabilities that we could support in SW.
The set API may shape how the get API looks. We need a minimal viable API where the whole control part of it is not "firmware or proprietary tools take care of that".
Do you have public docs on how the whole solution works?
The best reference would be my netdev 0x15 tutorial: https://netdevconf.info/0x15/session.html?Introduction-to-time-synchronizati... The SyncE API considerations starts ~54:00, but basically we need API for: - Controlling the lane to pin mapping for clock recovery - Check the EEC/DPLL state and see what's the source of reference frequency (in more advanced deployments) - control additional input and output pins (GNSS input, external inputs, recovered frequency reference)
The DPLL will operate on pins, so it will have a pin connected from the MAC/PHY that will have the recovered clock, but the recovered clock can be enabled from any port/lane. That information is kept in the MAC/PHY and the DPLL side will not be aware who it belongs to.
So the clock outputs are muxed to a single pin at the Ethernet IP level, in your design. I wonder if this is the common implementation and therefore if it's safe to bake that into the API. Input from other vendors would be great...
I believe this is the state-of-art: here's the Broadcom public one https://docs.broadcom.com/doc/1211168567832, I believe Marvel has similar solution. But would also be happy to hear others.
Interesting. That reveals the need for also marking the backup (/secondary) clock.
That's optional, but useful. And here's where we need a feedback on which port/lane is currently used, as the switch may be automatic
Have you seen any docs on how systems with discreet PHY ASICs mux the clocks?
Yes - unfortunately they are not public :(
Also do I understand correctly that the output of the Ethernet IP is just the raw Rx clock once receiver is locked and the DPLL which enum if_synce_state refers to is in the time IP, that DPLL could be driven by GNSS etc?
Ethernet IP/PHY usually outputs a divided clock signal (since it's easier to route) derived from the RX clock. The DPLL connectivity is vendor-specific, as you can use it to connect some external signals, but you can as well just care about relying the SyncE clock and only allow recovering it and passing along the QL info when your EEC is locked. That's why I backed up from a full DPLL implementation in favor of a more generic EEC clock.
What is an ECC clock? To me the PLL state in the Ethernet port is the state of the recovered clock. enum if_eec_state has values like holdover which seem to be more applicable to the "system wide" PLL.
EEC is Ethernet Equipment Clock. In most cases this will be a DPLL, but that's not mandatory and I believe it may be different is switches where you only need to drive all ports TX from a single frequency source. In this case the DPLL can be embedded in the multiport PHY,
Let me ask this - if one port is training the link and the other one has the lock and is the source - what state will be reported for each port?
In this case the port that has the lock source will report the lock and the EEC_SRC_PORT flag. The port that trains the link will show the lock without the flag and once it completes the training sequence it will use the EEC's frequency to transmit the data so that the next hop is able to synchronize its EEC to the incoming RX frequency
The Time IP is again relative and vendor-specific. If SyncE is deployed alongside PTP it will most likely be tightly coupled, but if you only care about having a frequency source - it's not mandatory and it can be as well in the PHY IP.
I would not think having just the freq is very useful.
This depends on the deployment. There are couple popular frequencies Most popular are 2,048 kHz, 10 MHz and 64 kHz. There are many deployments that only require frequency sync without the phase and/or time. I.e. if you deploy frequency division duplex you only need the frequency reference, and the higher frequency you have - the faster you can lock to it.
Also I think I will strip the reported states to the bare minimum defined in the ITU-T G.781 instead of reusing the states that were already defined for a specific DPLL.
This is the right thinking. The DPLL can also have different external
sources,
like the GNSS, and can also drive different output clocks. But for the
most
basic SyncE implementation, which only runs on a recovered clock, we
won't
need the DPLL subsystem.
The GNSS pulse would come in over an external pin, tho, right? Your earlier version of the patchset had GNSS as an enum value, how would the driver / FW know that a given pin means GNSS?
The GNSS 1PPS will more likely go directly to the "full" DPLL. The pin topology can be derived from FW or any vendor-specific way of
mapping
pins to their sources. And, in "worst" case can just be hardcoded for a
specific
device.
On Wed, 8 Sep 2021 08:03:35 +0000 Machnikowski, Maciej wrote:
Yep! Yet let's go one step at a time. I believe once we have the basics (EEC monitoring and recovered clock configuration) we'll be able to implement a basic functionality - and add bells and whistles later on, as there are more capabilities that we could support in SW.
The set API may shape how the get API looks. We need a minimal viable API where the whole control part of it is not "firmware or proprietary tools take care of that".
Do you have public docs on how the whole solution works?
The best reference would be my netdev 0x15 tutorial: https://netdevconf.info/0x15/session.html?Introduction-to-time-synchronizati... The SyncE API considerations starts ~54:00, but basically we need API for:
- Controlling the lane to pin mapping for clock recovery
- Check the EEC/DPLL state and see what's the source of reference frequency
(in more advanced deployments)
- control additional input and output pins (GNSS input, external inputs, recovered frequency reference)
Thanks, good presentation! I haven't seen much in terms of system design, at the level similar to the Broadcom whitepaper you shared.
I believe this is the state-of-art: here's the Broadcom public one https://docs.broadcom.com/doc/1211168567832, I believe Marvel has similar solution. But would also be happy to hear others.
Interesting. That reveals the need for also marking the backup (/secondary) clock.
That's optional, but useful. And here's where we need a feedback on which port/lane is currently used, as the switch may be automatic
What do you mean by optional? How does the user know if there is fallback or not? Is it that Intel is intending to support only devices with up to 2 ports and the second port is always the backup (apologies for speculating).
Have you seen any docs on how systems with discreet PHY ASICs mux the clocks?
Yes - unfortunately they are not public :(
Mumble, mumble. Ido, Florian - any devices within your purview which would support SyncE?
Ethernet IP/PHY usually outputs a divided clock signal (since it's easier to route) derived from the RX clock. The DPLL connectivity is vendor-specific, as you can use it to connect some external signals, but you can as well just care about relying the SyncE clock and only allow recovering it and passing along the QL info when your EEC is locked. That's why I backed up from a full DPLL implementation in favor of a more generic EEC clock.
What is an ECC clock? To me the PLL state in the Ethernet port is the state of the recovered clock. enum if_eec_state has values like holdover which seem to be more applicable to the "system wide" PLL.
EEC is Ethernet Equipment Clock. In most cases this will be a DPLL, but that's not mandatory and I believe it may be different is switches where you only need to drive all ports TX from a single frequency source. In this case the DPLL can be embedded in the multiport PHY,
Let me ask this - if one port is training the link and the other one has the lock and is the source - what state will be reported for each port?
In this case the port that has the lock source will report the lock and the EEC_SRC_PORT flag. The port that trains the link will show the lock without the flag and once it completes the training sequence it will use the EEC's frequency to transmit the data so that the next hop is able to synchronize its EEC to the incoming RX frequency
Alright, I don't like that. It feels like you're attaching one object's information (ECC) to other objects (ports), and repeating it. Prof Goczyła and dr Landowska would not be proud.
The Time IP is again relative and vendor-specific. If SyncE is deployed alongside PTP it will most likely be tightly coupled, but if you only care about having a frequency source - it's not mandatory and it can be as well in the PHY IP.
I would not think having just the freq is very useful.
This depends on the deployment. There are couple popular frequencies Most popular are 2,048 kHz, 10 MHz and 64 kHz. There are many deployments that only require frequency sync without the phase and/or time. I.e. if you deploy frequency division duplex you only need the frequency reference, and the higher frequency you have - the faster you can lock to it.
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Wednesday, September 8, 2021 6:21 PM Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Wed, 8 Sep 2021 08:03:35 +0000 Machnikowski, Maciej wrote:
Yep! Yet let's go one step at a time. I believe once we have the basics
(EEC
monitoring and recovered clock configuration) we'll be able to
implement
a basic functionality - and add bells and whistles later on, as there are
more
capabilities that we could support in SW.
The set API may shape how the get API looks. We need a minimal viable API where the whole control part of it is not "firmware or proprietary tools take care of that".
Do you have public docs on how the whole solution works?
The best reference would be my netdev 0x15 tutorial: https://netdevconf.info/0x15/session.html?Introduction-to-time-
synchronization-over-Ethernet
The SyncE API considerations starts ~54:00, but basically we need API for:
- Controlling the lane to pin mapping for clock recovery
- Check the EEC/DPLL state and see what's the source of reference
frequency
(in more advanced deployments)
- control additional input and output pins (GNSS input, external inputs,
recovered
frequency reference)
Thanks, good presentation! I haven't seen much in terms of system design, at the level similar to the Broadcom whitepaper you shared.
See the "drawing" below.
I believe this is the state-of-art: here's the Broadcom public one https://docs.broadcom.com/doc/1211168567832, I believe Marvel has similar solution. But would also be happy to hear others.
Interesting. That reveals the need for also marking the backup (/secondary) clock.
That's optional, but useful. And here's where we need a feedback on which port/lane is currently used, as the switch may be automatic
What do you mean by optional? How does the user know if there is fallback or not? Is it that Intel is intending to support only devices with up to 2 ports and the second port is always the backup (apologies for speculating).
That will be a part of pin config API. It needs to give info about the number of supported pins that the PHY/MAC can use as recovered clock outputs. Once the pin is assigned to the lane the recovered clock (divided or not) will appear on the configured PHY/MAC pin and EEC will be able to use it. If there is more than 1 available - they will have some priority assigned to them that will be known to the EEC/board designer.
And we plan to support devices that only comes with 1 recovered clock output as well.
Have you seen any docs on how systems with discreet PHY ASICs mux the clocks?
Yes - unfortunately they are not public :(
Mumble, mumble. Ido, Florian - any devices within your purview which would support SyncE?
OK Found one that's public: https://www.marvell.com/content/dam/marvell/en/public-collateral/transceiver... see Fig. 23 and chapter 3.11 for details, but conceptually it's similar.
Ethernet IP/PHY usually outputs a divided clock signal (since it's easier to route) derived from the RX clock. The DPLL connectivity is vendor-specific, as you can use it to connect some external signals, but you can as well just care about relying the SyncE clock and only allow recovering it and passing along the QL info when your EEC is locked. That's why I backed up from a full DPLL implementation in favor of a more generic EEC clock.
What is an ECC clock? To me the PLL state in the Ethernet port is the state of the recovered clock. enum if_eec_state has values like holdover which seem to be more applicable to the "system wide" PLL.
EEC is Ethernet Equipment Clock. In most cases this will be a DPLL, but that's not mandatory and I believe it may be different is switches where you only need to drive all ports TX from a single frequency source. In this case the DPLL can be embedded in the multiport PHY,
Let me ask this - if one port is training the link and the other one has the lock and is the source - what state will be reported for each port?
In this case the port that has the lock source will report the lock and the EEC_SRC_PORT flag. The port that trains the link will show the lock without the flag and once it completes the training sequence it will use the EEC's frequency to transmit the data so that the next hop is able to synchronize its EEC to the incoming RX frequency
Alright, I don't like that. It feels like you're attaching one object's information (ECC) to other objects (ports), and repeating it. Prof Goczyła and dr Landowska would not be proud.
Hahaha - did not expect them to pop up here :) It's true, but the problem is that they depend on each other. The path is:
Lane0 ------------- |\ Pin0 RefN ____ ------------- | |-----------------| | synced clk ... | |-----------------| EEC |------------------ ------------- |/ PinN RefM|____ | Lane N MUX
To get the full info a port needs to know the EEC state and which lane is used as a source (or rather - my lane or any other). The lane -> Pin mapping is buried in the PHY/MAC, but the source of frequency is in the EEC. What's even more - the Pin->Ref mapping is board specific.
The viable solutions are: - Limit to the proposed "I drive the clock" vs "Someone drives it" and assume the Driver returns all info - return the EEC Ref index, figure out which pin is connected to it and then check which MAC/PHY lane that drives it.
I assume option one is easy to implement and keep in the future even if we finally move to option 2 once we define EEC/DPLL subsystem.
In future #1 can take the lock information from the DPLL subsystem, but will also enable simple deployments that won't expose the whole DPLL, like a filter PLL embedded in a multiport PHY that will only work for SyncE in which case this API will only touch a single component.
The Time IP is again relative and vendor-specific. If SyncE is deployed alongside PTP it will most likely be tightly coupled, but if you only care about having a frequency source - it's not mandatory and it can be as well in the PHY IP.
I would not think having just the freq is very useful.
This depends on the deployment. There are couple popular frequencies Most popular are 2,048 kHz, 10 MHz and 64 kHz. There are many deployments that only require frequency sync without the phase and/or time. I.e. if you deploy frequency division duplex you only need the frequency reference, and the higher frequency you have - the faster you
can
lock to it.
The SyncE API considerations starts ~54:00, but basically we need API for:
- Controlling the lane to pin mapping for clock recovery
- Check the EEC/DPLL state and see what's the source of reference
frequency
(in more advanced deployments)
- control additional input and output pins (GNSS input, external inputs,
recovered
frequency reference)
Now that you have pointed to a datasheet...
- Controlling the lane to pin mapping for clock recovery
So this is a PHY property. That could be Linux driving the PHY, via phylib, drivers/net/phy, or there could be firmware in the MAC driver which hides the PHY and gives you some sort of API to access it.
Check the EEC/DPLL state and see what's the source of reference frequency
Where is the EEC/DPLL implemented? Is it typically also in the PHY? Or some other hardware block?
I just want to make sure we have an API which we can easily delegate to different subsystems, some of it in the PHY driver, maybe some of it somewhere else.
Also, looking at the Marvell datasheet, it appears these registers are in the MDIO_MMD_VEND2 range. Has any of this been specified? Can we expect to be able to write a generic implementation sometime in the future which PHY drivers can share?
I just looked at a 1G Marvell PHY. It uses RGMII or SGMII towards the host. But there is no indication you can take the clock from the SGMII SERDES, it is only the recovered clock from the line. And the recovered clock always goes out the CLK125 pin, which can either be 125MHz or 25MHz.
So in this case, you have no need to control the lane to pin mapping, it is fixed, but do we want to be able to control the divider?
Do we need a mechanism to actually enumerate what the hardware can do?
Since we are talking about clocks and dividers, and multiplexors, should all this be using the common clock framework, which already supports most of this? Do we actually need something new?
Andrew
-----Original Message----- From: Andrew Lunn andrew@lunn.ch Sent: Wednesday, September 8, 2021 9:35 PM To: Machnikowski, Maciej maciej.machnikowski@intel.com Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
The SyncE API considerations starts ~54:00, but basically we need API
for:
- Controlling the lane to pin mapping for clock recovery
- Check the EEC/DPLL state and see what's the source of reference
frequency
(in more advanced deployments)
- control additional input and output pins (GNSS input, external inputs,
recovered
frequency reference)
Now that you have pointed to a datasheet...
- Controlling the lane to pin mapping for clock recovery
So this is a PHY property. That could be Linux driving the PHY, via phylib, drivers/net/phy, or there could be firmware in the MAC driver which hides the PHY and gives you some sort of API to access it.
Yes, that's deployment dependent - in our case we use MAC driver that proxies that.
Check the EEC/DPLL state and see what's the source of reference frequency
Where is the EEC/DPLL implemented? Is it typically also in the PHY? Or some other hardware block?
In most cases it will be an external device, but there are implementations That have a PLL/DPLL inside the PHY (like a Broadcom example). And I don't know how switches implements that.
I just want to make sure we have an API which we can easily delegate to different subsystems, some of it in the PHY driver, maybe some of it somewhere else.
The reasoning behind putting it in the ndo subsystem is because ultimately the netdev receives the ESMC message and is the entity that should know how it is connected to the PHY. That's because it will be very complex to move the mapping between netdevs and PHY lanes/ports to the SW running in the userspace. I believe the right way is to let the netdev driver manage that complexity and forward the request to the PHY subsystem if needed. The logic is: - Receive ESMC message on a netdev - send the message to enable recovered clock to the netdev that received it - ask the netdev to give the status of EEC that drives it.
Also, looking at the Marvell datasheet, it appears these registers are in the MDIO_MMD_VEND2 range. Has any of this been specified? Can we expect to be able to write a generic implementation sometime in the future which PHY drivers can share?
I believe it will be vendor-specific, as there are different implementations of it. It can be an internal PLL that gets all inputs and syncs to a one of them or it can be a MUX with divder(s)
I just looked at a 1G Marvell PHY. It uses RGMII or SGMII towards the host. But there is no indication you can take the clock from the SGMII SERDES, it is only the recovered clock from the line. And the recovered clock always goes out the CLK125 pin, which can either be 125MHz or 25MHz.
So in this case, you have no need to control the lane to pin mapping, it is fixed, but do we want to be able to control the divider?
That's a complex question. Controlling the divider would make sense when we have control over the DPLL, and it still needs to be optional, as it's not always available. I assume that in the initial implementation we can rely on MAC driver to set up the dividers to output the expected frequency to the DPLL. On faster interfaces the RCLK speed is also speed-dependent and differs across the link speeds.
Do we need a mechanism to actually enumerate what the hardware can do?
For recovered clocks I think we need to start with the number of PHY outputs that go towards the DPLL. We can add additional attributes like a frequency and others once we need them. I want to avoid creating a swiss-knife with too many options here :)
Since we are talking about clocks and dividers, and multiplexors, should all this be using the common clock framework, which already supports most of this? Do we actually need something new?
I believe that's a specific enough case that deserves a separate one Regards Maciek
On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote:
Since we are talking about clocks and dividers, and multiplexors, should all this be using the common clock framework, which already supports most of this? Do we actually need something new?
Does the common clock framework expose any user space API?
On Wed, Sep 08, 2021 at 03:20:27PM -0700, Jakub Kicinski wrote:
On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote:
Since we are talking about clocks and dividers, and multiplexors, should all this be using the common clock framework, which already supports most of this? Do we actually need something new?
Does the common clock framework expose any user space API?
Ah, good point. No, i don't think it does, apart from debugfs, which is not really a user space API, and it contains read only descriptions of the clock tree, current status, mux settings, dividers etc.
So probably anybody implemented the proposed API the Linux way will use the common clock framework and its internal API, and debug the implementation via debugfs.
PHY drivers already do use it, e.g. when the PHY is using a clock provided by the MAC, it uses the API to enable/disable and set ifs frequency as needed.
Andrew
On Thu, Sep 09, 2021 at 12:59:27AM +0200, Andrew Lunn wrote:
On Wed, Sep 08, 2021 at 03:20:27PM -0700, Jakub Kicinski wrote:
On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote:
Since we are talking about clocks and dividers, and multiplexors, should all this be using the common clock framework, which already supports most of this? Do we actually need something new?
Does the common clock framework expose any user space API?
Ah, good point. No, i don't think it does, apart from debugfs, which is not really a user space API, and it contains read only descriptions of the clock tree, current status, mux settings, dividers etc.
Wouldn't it make sense to develop some kind of user land API to manipulate the common clock framework at run time?
Just dreaming...
Richard
-----Original Message----- From: Richard Cochran richardcochran@gmail.com Sent: Thursday, September 9, 2021 4:09 AM To: Andrew Lunn andrew@lunn.ch Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Thu, Sep 09, 2021 at 12:59:27AM +0200, Andrew Lunn wrote:
On Wed, Sep 08, 2021 at 03:20:27PM -0700, Jakub Kicinski wrote:
On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote:
Since we are talking about clocks and dividers, and multiplexors, should all this be using the common clock framework, which already supports most of this? Do we actually need something new?
Does the common clock framework expose any user space API?
Ah, good point. No, i don't think it does, apart from debugfs, which is not really a user space API, and it contains read only descriptions of the clock tree, current status, mux settings, dividers etc.
Wouldn't it make sense to develop some kind of user land API to manipulate the common clock framework at run time?
Just dreaming...
Richard
That may be dangerous. In SyncE world we control clocks that are only references to the EEC block. The worst that can happen is that the DPLL will ignore incoming clock as it has invalid frequency, or it will drift off to the edge of allowed range. Controlling the clock that actually drives any components (PHY/MAC) in runtime can be a good way to brick the part. I feel that, while the reuse of structures may be a good idea, the userspace API for clocks is not. They are usually set up once at the board init level and stay like that "forever".
The outputs we need to control are only a subset of all of them and they rather fall in the PTP pins level of details, rather than clock ones.
On Thu, Sep 09, 2021 at 08:18:10AM +0000, Machnikowski, Maciej wrote:
Controlling the clock that actually drives any components (PHY/MAC) in runtime can be a good way to brick the part.
I didn't say that.
I feel that, while the reuse of structures may be a good idea, the userspace API for clocks is not. They are usually set up once at the board init level and stay like that "forever".
The outputs we need to control are only a subset of all of them and they rather fall in the PTP pins level of details, rather than clock ones.
clk-gate.c clk-mux.c
Making that available for user space to twiddle is a better way that tacking on to the PTP stuff.
You can model your device as having a multiplexer in front of it.
Thanks, Richard
On Wed, 8 Sep 2021 17:30:24 +0000 Machnikowski, Maciej wrote:
Lane0 ------------- |\ Pin0 RefN ____ ------------- | |-----------------| | synced clk | |-----------------| EEC |------------------ ------------- |/ PinN RefM|____ | Lane N MUX
To get the full info a port needs to know the EEC state and which lane is used as a source (or rather - my lane or any other).
EEC here is what the PHY documentation calls "Cleanup PLL" right?
The lane -> Pin mapping is buried in the PHY/MAC, but the source of frequency is in the EEC.
Not sure what "source of frequency" means here. There's a lot of frequencies here.
What's even more - the Pin->Ref mapping is board specific.
Breaking down the system into components we have:
Port A.1 Rx lanes A.2 Rx pins (outputs) A.3 Rx clk divider B.1 Tx lanes B.2 Tx pins (inputs)
ECC C.1 Inputs C.2 Outputs C.3 PLL state
In the most general case we want to be able to: map recovered clocks to PHY output pins (A.1 <> A.2) set freq div on the recovered clock (A.2 <> A.3) set the priorities of inputs on ECC (C.1) read the ECC state (C.3) control outputs of the ECC (C.2) select the clock source for port Tx (B.2 <> B.1)
As you said, pin -> ref mapping is board specific, so the API should not assume knowledge of routing between Port and ECC. If it does just give the pins matching names.
We don't have to implement the entire design but the pieces we do create must be right for the larger context. With the current code the ECC/Cleanup PLL is not represented as a separate entity, and mapping of what source means is on the wrong "end" of the A.3 <> C.1 relationship.
The viable solutions are:
- Limit to the proposed "I drive the clock" vs "Someone drives it" and assume the Driver returns all info
- return the EEC Ref index, figure out which pin is connected to it and then check which MAC/PHY lane that drives it.
I assume option one is easy to implement and keep in the future even if we finally move to option 2 once we define EEC/DPLL subsystem.
In future #1 can take the lock information from the DPLL subsystem, but will also enable simple deployments that won't expose the whole DPLL, like a filter PLL embedded in a multiport PHY that will only work for SyncE in which case this API will only touch a single component.
Imagine a system with two cascaded switch ASICs and a bunch of PHYs. How do you express that by pure extensions to the proposed API? Here either the cleanup PLLs would be cascaded (subordinate one needs to express that its "source" is another PLL) or single lane can be designated as a source for both PLLs (but then there is only one "source" bit and multiple "enum if_eec_state"s).
I think we can't avoid having a separate object for ECC/Cleanup PLL. You can add it as a subobject to devlink but new genetlink family seems much preferable given the devlink instances themselves have unclear semantics at this point. Or you can try to convince Richard that ECC belongs as part of PTP :)
In fact I don't think you care about any of the PHY / port stuff currently. All you need is the ECC side of the API. IIUC you have relatively simple setup where there is only one pin per port, and you don't care about syncing the Tx clock.
As you said, pin -> ref mapping is board specific, so the API should not assume knowledge of routing between Port and ECC.
That information will probably end up in device tree. And X different implementations of ACPI, unless somebody puts there foot down and stops the snow flakes.
Imagine a system with two cascaded switch ASICs and a bunch of PHYs. How do you express that by pure extensions to the proposed API?
Device tree is good at that. ACPI might eventually catch up.
How complex a setup do we actually expect? Can there be multiple disjoint SyncE trees within an Ethernet switch cluster? Or would it be reasonable to say all you need to configure is the clock source, and all other ports of the switches are slaves if SyncE is enabled for the port? I've never see any SOHO switch hardware which allows you to have disjoint PTP trees, so it does not sound too unreasonable to only allow a single SyncE tree per switch cluster.
Also, if you are cascading switches, you generally don't put PHYs in the middle, you just connect the SERDES lanes together.
Andrew
On Thu, 9 Sep 2021 01:14:36 +0200 Andrew Lunn wrote:
As you said, pin -> ref mapping is board specific, so the API should not assume knowledge of routing between Port and ECC.
That information will probably end up in device tree. And X different implementations of ACPI, unless somebody puts there foot down and stops the snow flakes.
Imagine a system with two cascaded switch ASICs and a bunch of PHYs. How do you express that by pure extensions to the proposed API?
Device tree is good at that. ACPI might eventually catch up.
I could well be wrong but some of those connectors could well be just SMAs on the back plate, no? User could cable those up to their heart content.
https://engineering.fb.com/2021/08/11/open-source/time-appliance/
How complex a setup do we actually expect? Can there be multiple disjoint SyncE trees within an Ethernet switch cluster? Or would it be reasonable to say all you need to configure is the clock source, and all other ports of the switches are slaves if SyncE is enabled for the port? I've never see any SOHO switch hardware which allows you to have disjoint PTP trees, so it does not sound too unreasonable to only allow a single SyncE tree per switch cluster.
Not sure. I get the (perhaps unfounded) feeling that just forwarding the clock from one port to the rest is simpler. Maciej cares primarily about exposing the clock to other non-Ethernet things, the device would be an endpoint here, using the clock for LTE or whatnot.
Also, if you are cascading switches, you generally don't put PHYs in the middle, you just connect the SERDES lanes together.
My concern was a case where PHY connected to one switch exposes the refclock on an aux pin which is then muxed to more than one switch ASIC. IOW the "source port" is not actually under the same switch.
Again, IDK if that's realistic.
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Thursday, September 9, 2021 1:58 AM To: Andrew Lunn andrew@lunn.ch Cc: Machnikowski, Maciej maciej.machnikowski@intel.com; netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; richardcochran@gmail.com; abyagowi@fb.com; Nguyen, Anthony L anthony.l.nguyen@intel.com; davem@davemloft.net; linux- kselftest@vger.kernel.org; Michal Kubecek mkubecek@suse.cz; Saeed Mahameed saeed@kernel.org; Michael Chan michael.chan@broadcom.com Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Thu, 9 Sep 2021 01:14:36 +0200 Andrew Lunn wrote:
As you said, pin -> ref mapping is board specific, so the API should not assume knowledge of routing between Port and ECC.
That information will probably end up in device tree. And X different implementations of ACPI, unless somebody puts there foot down and stops the snow flakes.
Imagine a system with two cascaded switch ASICs and a bunch of PHYs. How do you express that by pure extensions to the proposed API?
Device tree is good at that. ACPI might eventually catch up.
I could well be wrong but some of those connectors could well be just SMAs on the back plate, no? User could cable those up to their heart content.
https://engineering.fb.com/2021/08/11/open-source/time-appliance/
Yep! We should base on the abstract indexes, rather than the device tree. The user needs to make sense of the indexes, just like with PTP pins.
How complex a setup do we actually expect? Can there be multiple disjoint SyncE trees within an Ethernet switch cluster? Or would it be reasonable to say all you need to configure is the clock source, and all other ports of the switches are slaves if SyncE is enabled for the port? I've never see any SOHO switch hardware which allows you to have disjoint PTP trees, so it does not sound too unreasonable to only allow a single SyncE tree per switch cluster.
Not sure. I get the (perhaps unfounded) feeling that just forwarding the clock from one port to the rest is simpler. Maciej cares primarily about exposing the clock to other non-Ethernet things, the device would be an endpoint here, using the clock for LTE or whatnot.
Also multiple PTP domain switches starts appearing on the market. I know Cisco makes them: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus3548/sw/syste... Disjoint SyncE trees will be harder to implement due to the physical nature of it.
Also, if you are cascading switches, you generally don't put PHYs in the middle, you just connect the SERDES lanes together.
My concern was a case where PHY connected to one switch exposes the refclock on an aux pin which is then muxed to more than one switch ASIC. IOW the "source port" is not actually under the same switch.
Again, IDK if that's realistic.
It can be done, but the userspace app should make sense out of this config. Coding all scenarios in kernel would make 1000000 different possible configurations in the driver - which probably no one wants to keep/maintain. We can return info about hardwired pins (like GNSS, PHY recovered clks, PTP clock and so on) but should also give some generic ones.
Dave,
Are there any free slots on Plumbers to discuss and close on SyncE interfaces (or can we add an extra one). I can reuse the slides from the Netdev to give background and a live discussion may help closing opens around it, and I'd be happy to co-present with anyone who wants to also join this effort.
Regards Maciek
From: "Machnikowski, Maciej" maciej.machnikowski@intel.com Date: Thu, 9 Sep 2021 09:24:07 +0000
Dave,
Are there any free slots on Plumbers to discuss and close on SyncE interfaces (or can we add an extra one). I can reuse the slides from the Netdev to give background and a live discussion may help closing opens around it, and I'd be happy to co-present with anyone who wants to also join this effort.
Sorry, I think it's much too late for this.
-----Original Message----- From: Jakub Kicinski kuba@kernel.org Sent: Thursday, September 9, 2021 12:19 AM To: Machnikowski, Maciej maciej.machnikowski@intel.com Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Wed, 8 Sep 2021 17:30:24 +0000 Machnikowski, Maciej wrote:
Lane0 ------------- |\ Pin0 RefN ____ ------------- | |-----------------| | synced clk | |-----------------| EEC |------------------ ------------- |/ PinN RefM|____ | Lane N MUX
To get the full info a port needs to know the EEC state and which lane is
used
as a source (or rather - my lane or any other).
EEC here is what the PHY documentation calls "Cleanup PLL" right?
It's a generic term for an internal or external PLL that takes the reference frequency from the certain port/lane and drives its TX part. In a specific case it can be the internal cleanup PLL.
The lane -> Pin mapping is buried in the PHY/MAC, but the source of
frequency
is in the EEC.
Not sure what "source of frequency" means here. There's a lot of frequencies here.
It's the source lane/port that drives the DPLL reference. In other words, the DPLL will tune its frequency to match the one recovered from a certain PHY lane/port.
What's even more - the Pin->Ref mapping is board specific.
Breaking down the system into components we have:
Port A.1 Rx lanes A.2 Rx pins (outputs) A.3 Rx clk divider B.1 Tx lanes B.2 Tx pins (inputs)
ECC C.1 Inputs C.2 Outputs C.3 PLL state
In the most general case we want to be able to: map recovered clocks to PHY output pins (A.1 <> A.2) set freq div on the recovered clock (A.2 <> A.3)
Technically the pin (if exists) will be the last thing, so a better way would be to map lane to the divider (A.1<>A.3) and then a divider to pin (A.3<>A.2), but the idea is the same
set the priorities of inputs on ECC (C.1) read the ECC state (C.3) control outputs of the ECC (C.2)
Yes!
select the clock source for port Tx (B.2 <> B.1)
And here we usually don't allow control over this. The DPLL is preconfigured to output the frequency that's expected on the PHY/MAC clock input and we shouldn't mess with it in runtime.
As you said, pin -> ref mapping is board specific, so the API should not assume knowledge of routing between Port and ECC. If it does just give the pins matching names.
I believe referring to a board user guide is enough to cover that one, just like with PTP pins. There may be 1000 different ways of connecting those signals.
We don't have to implement the entire design but the pieces we do create must be right for the larger context. With the current code the ECC/Cleanup PLL is not represented as a separate entity, and mapping of what source means is on the wrong "end" of the A.3 <> C.1 relationship.
That's why I initially started with the EEC state + pin idx of the driving source. I believe it's a cleaner solution, as then we can implement the same pin indexes in the recovered clock part of the interface and the user will be able to see the state and driving pin from the EEC_STATE (both belongs to the DPLL) and use the PHY pins subsystem to see if the driving pin index matches the index that I drive. In that case we keep all C-thingies in the C subsystem and A stuff in A subsystem. The only "mix" is in the pin indexes that would use numbering from C. If it's an attribute - it can as well be optional for the deployments that don't need/support it.
The viable solutions are:
- Limit to the proposed "I drive the clock" vs "Someone drives it" and
assume the
Driver returns all info
- return the EEC Ref index, figure out which pin is connected to it and then
check
which MAC/PHY lane that drives it.
I assume option one is easy to implement and keep in the future even if
we
finally move to option 2 once we define EEC/DPLL subsystem.
In future #1 can take the lock information from the DPLL subsystem, but will also enable simple deployments that won't expose the whole DPLL, like a filter PLL embedded in a multiport PHY that will only work for SyncE in which case this API will only touch a single component.
Imagine a system with two cascaded switch ASICs and a bunch of PHYs. How do you express that by pure extensions to the proposed API? Here either the cleanup PLLs would be cascaded (subordinate one needs to express that its "source" is another PLL) or single lane can be designated as a source for both PLLs (but then there is only one "source" bit and multiple "enum if_eec_state"s).
In that case - once we have pins- we'll see that the leader DPLL is synced to the pin that comes from the PHY/MAC and be able to find the corresponding lane, and on the follower side we'll see that it's locked to the pin that corresponds to the master DPLL. The logic to "do something" with that knowledge needs to be in the userspace app, as there may be some external connections needed that are unknown at the board level (think of 2 separate adapters connected with an external cable).
I think we can't avoid having a separate object for ECC/Cleanup PLL. You can add it as a subobject to devlink but new genetlink family seems much preferable given the devlink instances themselves have unclear semantics at this point. Or you can try to convince Richard that ECC belongs as part of PTP :)
In fact I don't think you care about any of the PHY / port stuff currently. All you need is the ECC side of the API. IIUC you have relatively simple setup where there is only one pin per port, and you don't care about syncing the Tx clock.
I actually do. There's (almost) always less recovered clock resources (aka pins) than ports/lanes in the system. The TX clock will be synchronized once the EEC reports the lock state, as it's the part that generates clocks for the TX part of the PHY.
On Wed, Sep 08, 2021 at 09:21:15AM -0700, Jakub Kicinski wrote:
Mumble, mumble. Ido, Florian - any devices within your purview which would support SyncE?
Sorry for the delay, was AFK. I remember SyncE being mentioned a few times in the past, so it is likely that we will be requested to add SyncE support in mlxsw in the future. I will try to solicit feedback from colleagues more familiar with the subject and provide it here next week.
On Wed, Sep 08, 2021 at 09:21:15AM -0700, Jakub Kicinski wrote:
Mumble, mumble. Ido, Florian - any devices within your purview which would support SyncE?
So it's public info that Connect-X has it: https://www.mellanox.com/files/doc-2020/pb-connectx-6-dx-en-card.pdf
Given the nature of SyncE, I think you can extrapolate from that about other devices :)
Anyway, Petr and I started discussing this with the colleagues defining the HW/SW interfaces and we will help with the review to make sure we end up with a uAPI that works across multiple implementations.
On Fri, Sep 03, 2021 at 05:14:35PM +0200, Maciej Machnikowski wrote:
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index eebd3894fe89..78a8a5af8cd8 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1273,4 +1273,35 @@ enum { #define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1) +/* SyncE section */
+enum if_eec_state {
- IF_EEC_STATE_INVALID = 0,
- IF_EEC_STATE_FREERUN,
- IF_EEC_STATE_LOCKACQ,
- IF_EEC_STATE_LOCKREC,
- IF_EEC_STATE_LOCKED,
- IF_EEC_STATE_HOLDOVER,
- IF_EEC_STATE_OPEN_LOOP,
- __IF_EEC_STATE_MAX,
Can you document these states? I'm not clear on what LOCKACQ, LOCKREC and OPEN_LOOP mean. I also don't see ice using them and it's not really a good practice to add new uAPI without any current users.
From v1 I understand that these states were copied from commit
797d3186544f ("ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.") from Renesas.
Figure 11 in the following PDF describes the states, but it seems specific to the particular device and probably shouldn't be exposed to user space as-is: https://www.renesas.com/us/en/document/dst/8a34041-datasheet
I have a few questions about this being a per-netdev attribute:
1. My understanding is that in the multi-port adapter you are working with you have a single EEC that is used to set the Tx frequency of all the ports. Is that correct?
2. Assuming the above is correct, is it possible that one port is in LOCKED state and another (for some reason) is in HOLDOVER state? If yes, then I agree about this being a per-netdev attribute. The interface can also be extended with another attribute specifying the HOLDOVER reason. For example, netdev being down.
Regardless, I agree with previous comments made about this belonging in ethtool rather than rtnetlink.
+};
+#define IF_EEC_STATE_MAX (__IF_EEC_STATE_MAX - 1) +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the port is
* currently the source for the EEC
*/
I'm not sure about this one. If we are going to expose this as a per-netdev attribute (see more below), any reason it can't be added as another state (e.g., IF_EEC_STATE_LOCKED_SRC)?
IIUC, in the common case of a simple NE the source of the EEC is always one of the ports, but in the case of a PRC the source is most likely external (e.g., GPS).
If so, I think we need a way to represent the EEC as a separate object with the ability to set its source and report it via the same interface. I'm unclear on how exactly an external source looks like, but for the netdev case maybe something like this:
devlink clock show [ dev clock CLOCK ] devlink clock set DEV clock CLOCK [ { src_type SRC_TYPE } ] SRC_TYPE : = { port DEV/PORT_INDEX }
The only source type above is 'port' with the ability to set the relevant port, but more can be added. Obviously, 'devlink clock show' will give you the current source in addition to other information such as frequency difference with respect to the input frequency.
Finally, can you share more info about the relation to the PHC? My understanding is that one of the primary use cases for SyncE is to drive all the PHCs in the network using the same frequency. How do you imagine this configuration is going to look like? Can the above interface be extended for that?
All of the above might be complete nonsense as I'm still trying to wrap my head around the subject.
Thanks for working on this
-----Original Message----- From: Ido Schimmel idosch@idosch.org Sent: Tuesday, September 21, 2021 3:16 PM To: Machnikowski, Maciej maciej.machnikowski@intel.com Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Fri, Sep 03, 2021 at 05:14:35PM +0200, Maciej Machnikowski wrote:
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index eebd3894fe89..78a8a5af8cd8 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1273,4 +1273,35 @@ enum {
#define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1)
+/* SyncE section */
+enum if_eec_state {
- IF_EEC_STATE_INVALID = 0,
- IF_EEC_STATE_FREERUN,
- IF_EEC_STATE_LOCKACQ,
- IF_EEC_STATE_LOCKREC,
- IF_EEC_STATE_LOCKED,
- IF_EEC_STATE_HOLDOVER,
- IF_EEC_STATE_OPEN_LOOP,
- __IF_EEC_STATE_MAX,
Can you document these states? I'm not clear on what LOCKACQ, LOCKREC and OPEN_LOOP mean. I also don't see ice using them and it's not really a good practice to add new uAPI without any current users.
I'll fix that enum to use generic states defined in G.781 which are limited to: - Freerun - LockedACQ (locked, acquiring holdover memory) - Locked (locked with holdover acquired) - Holdover
From v1 I understand that these states were copied from commit 797d3186544f ("ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.") from Renesas.
Figure 11 in the following PDF describes the states, but it seems specific to the particular device and probably shouldn't be exposed to user space as-is: https://www.renesas.com/us/en/document/dst/8a34041-datasheet
I have a few questions about this being a per-netdev attribute:
- My understanding is that in the multi-port adapter you are working
with you have a single EEC that is used to set the Tx frequency of all the ports. Is that correct?
That's correct.
- Assuming the above is correct, is it possible that one port is in
LOCKED state and another (for some reason) is in HOLDOVER state? If yes, then I agree about this being a per-netdev attribute. The interface can also be extended with another attribute specifying the HOLDOVER reason. For example, netdev being down.
All ports driven by a single EEC will report the same state.
Regardless, I agree with previous comments made about this belonging in ethtool rather than rtnetlink.
Will take a look at it - as it will require support in linuxptp as well.
+};
+#define IF_EEC_STATE_MAX (__IF_EEC_STATE_MAX - 1) +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the
port is
* currently the source for the EEC
*/
I'm not sure about this one. If we are going to expose this as a per-netdev attribute (see more below), any reason it can't be added as another state (e.g., IF_EEC_STATE_LOCKED_SRC)?
OK! That's a great idea! Yet we'll need LOCKED_SRC and LOCKED_ACQ_SRC, but still sounds good.
IIUC, in the common case of a simple NE the source of the EEC is always one of the ports, but in the case of a PRC the source is most likely external (e.g., GPS).
True
If so, I think we need a way to represent the EEC as a separate object with the ability to set its source and report it via the same interface. I'm unclear on how exactly an external source looks like, but for the netdev case maybe something like this:
devlink clock show [ dev clock CLOCK ] devlink clock set DEV clock CLOCK [ { src_type SRC_TYPE } ] SRC_TYPE : = { port DEV/PORT_INDEX }
Unfortunately, EEC lives in 2 worlds - it belongs to a netdev (in very simple deployments the EEC may be a part of the PHY and only allow synchronizing the TX frequency to a single lane/port), but also can go outside of netdev and be a boar-wise frequency source.
The only source type above is 'port' with the ability to set the relevant port, but more can be added. Obviously, 'devlink clock show' will give you the current source in addition to other information such as frequency difference with respect to the input frequency.
We considered devlink interface for configuring the clock/DPLL, but a new concept was born at the list to add a DPLL subsystem that will cover more use cases, like a TimeCard.
Finally, can you share more info about the relation to the PHC? My understanding is that one of the primary use cases for SyncE is to drive all the PHCs in the network using the same frequency. How do you imagine this configuration is going to look like? Can the above interface be extended for that?
That would be a configurable parameter/option of the PTP program. Just like it can check the existence of link on a given port, it'll also be able to check if we use EEC and it's locked. If it is, and is a source of PHC frequency - the ptp tool can decide to not apply frequency corrections to the PHC, just like the ptp4l does when nullf servo is used, but can do that dynamically.
All of the above might be complete nonsense as I'm still trying to wrap my head around the subject.
It's certainly not! Thanks for your input!
On Tue, Sep 21, 2021 at 01:37:37PM +0000, Machnikowski, Maciej wrote:
-----Original Message----- From: Ido Schimmel idosch@idosch.org Sent: Tuesday, September 21, 2021 3:16 PM To: Machnikowski, Maciej maciej.machnikowski@intel.com Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
On Fri, Sep 03, 2021 at 05:14:35PM +0200, Maciej Machnikowski wrote:
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index eebd3894fe89..78a8a5af8cd8 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1273,4 +1273,35 @@ enum {
#define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1)
+/* SyncE section */
+enum if_eec_state {
- IF_EEC_STATE_INVALID = 0,
- IF_EEC_STATE_FREERUN,
- IF_EEC_STATE_LOCKACQ,
- IF_EEC_STATE_LOCKREC,
- IF_EEC_STATE_LOCKED,
- IF_EEC_STATE_HOLDOVER,
- IF_EEC_STATE_OPEN_LOOP,
- __IF_EEC_STATE_MAX,
Can you document these states? I'm not clear on what LOCKACQ, LOCKREC and OPEN_LOOP mean. I also don't see ice using them and it's not really a good practice to add new uAPI without any current users.
I'll fix that enum to use generic states defined in G.781 which are limited to:
- Freerun
- LockedACQ (locked, acquiring holdover memory)
- Locked (locked with holdover acquired)
- Holdover
Thanks, it is good to conform to a standard.
Can ice distinguish between LockedACQ and Locked? From G.781 I understand that the former is a transient state. Is the distinction between the two important for user space / the selection operation? If not, maybe we only need Locked?
From v1 I understand that these states were copied from commit 797d3186544f ("ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.") from Renesas.
Figure 11 in the following PDF describes the states, but it seems specific to the particular device and probably shouldn't be exposed to user space as-is: https://www.renesas.com/us/en/document/dst/8a34041-datasheet
I have a few questions about this being a per-netdev attribute:
- My understanding is that in the multi-port adapter you are working
with you have a single EEC that is used to set the Tx frequency of all the ports. Is that correct?
That's correct.
- Assuming the above is correct, is it possible that one port is in
LOCKED state and another (for some reason) is in HOLDOVER state? If yes, then I agree about this being a per-netdev attribute. The interface can also be extended with another attribute specifying the HOLDOVER reason. For example, netdev being down.
All ports driven by a single EEC will report the same state.
So maybe we just need to report via ethtool the association between the EEC and the netdev and expose the state as an attribute of the EEC (along with the selected source and other info)?
This is similar to how PHC/netdev association is queried via ethtool. For a given netdev, TSINFO_GET will report the PTP hw clock index via ETHTOOL_A_TSINFO_PHC_INDEX. See Documentation/networking/ethtool-netlink.rst
Regardless, I agree with previous comments made about this belonging in ethtool rather than rtnetlink.
Will take a look at it - as it will require support in linuxptp as well.
+};
+#define IF_EEC_STATE_MAX (__IF_EEC_STATE_MAX - 1) +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the
port is
* currently the source for the EEC
*/
I'm not sure about this one. If we are going to expose this as a per-netdev attribute (see more below), any reason it can't be added as another state (e.g., IF_EEC_STATE_LOCKED_SRC)?
OK! That's a great idea! Yet we'll need LOCKED_SRC and LOCKED_ACQ_SRC, but still sounds good.
IIUC, in the common case of a simple NE the source of the EEC is always one of the ports, but in the case of a PRC the source is most likely external (e.g., GPS).
True
If so, I think we need a way to represent the EEC as a separate object with the ability to set its source and report it via the same interface. I'm unclear on how exactly an external source looks like, but for the netdev case maybe something like this:
devlink clock show [ dev clock CLOCK ] devlink clock set DEV clock CLOCK [ { src_type SRC_TYPE } ] SRC_TYPE : = { port DEV/PORT_INDEX }
Unfortunately, EEC lives in 2 worlds - it belongs to a netdev (in very simple deployments the EEC may be a part of the PHY and only allow synchronizing the TX frequency to a single lane/port), but also can go outside of netdev and be a boar-wise frequency source.
The only source type above is 'port' with the ability to set the relevant port, but more can be added. Obviously, 'devlink clock show' will give you the current source in addition to other information such as frequency difference with respect to the input frequency.
We considered devlink interface for configuring the clock/DPLL, but a new concept was born at the list to add a DPLL subsystem that will cover more use cases, like a TimeCard.
The reason I suggested devlink is that it is suited for device-wide configuration and it is already used by both MAC drivers and the TimeCard driver. If we have a good reason to create a new generic netlink family for this stuff, then OK.
Finally, can you share more info about the relation to the PHC? My understanding is that one of the primary use cases for SyncE is to drive all the PHCs in the network using the same frequency. How do you imagine this configuration is going to look like? Can the above interface be extended for that?
That would be a configurable parameter/option of the PTP program. Just like it can check the existence of link on a given port, it'll also be able to check if we use EEC and it's locked. If it is, and is a source of PHC frequency - the ptp tool can decide to not apply frequency corrections to the PHC, just like the ptp4l does when nullf servo is used, but can do that dynamically.
The part I don't understand is "is a source of PHC frequency". How do we configure / query that?
On Tue, 21 Sep 2021 17:58:05 +0300 Ido Schimmel wrote:
The only source type above is 'port' with the ability to set the relevant port, but more can be added. Obviously, 'devlink clock show' will give you the current source in addition to other information such as frequency difference with respect to the input frequency.
We considered devlink interface for configuring the clock/DPLL, but a new concept was born at the list to add a DPLL subsystem that will cover more use cases, like a TimeCard.
The reason I suggested devlink is that it is suited for device-wide configuration and it is already used by both MAC drivers and the TimeCard driver. If we have a good reason to create a new generic netlink family for this stuff, then OK.
For NICs mapping between devlink instances and HW is not clear. Most register devlink per PCI dev which usually maps to a Eth port. So if we have one DPLL on a 2 port NIC mapping will get icky, no?
Is the motivation to save the boilerplate code associated with new genetlink family or something more?
On Tue, Sep 21, 2021 at 02:14:45PM -0700, Jakub Kicinski wrote:
On Tue, 21 Sep 2021 17:58:05 +0300 Ido Schimmel wrote:
The only source type above is 'port' with the ability to set the relevant port, but more can be added. Obviously, 'devlink clock show' will give you the current source in addition to other information such as frequency difference with respect to the input frequency.
We considered devlink interface for configuring the clock/DPLL, but a new concept was born at the list to add a DPLL subsystem that will cover more use cases, like a TimeCard.
The reason I suggested devlink is that it is suited for device-wide configuration and it is already used by both MAC drivers and the TimeCard driver. If we have a good reason to create a new generic netlink family for this stuff, then OK.
For NICs mapping between devlink instances and HW is not clear. Most register devlink per PCI dev which usually maps to a Eth port. So if we have one DPLL on a 2 port NIC mapping will get icky, no?
Yes, having to represent the same EEC in multiple devlink instances is not nice.
Is the motivation to save the boilerplate code associated with new genetlink family or something more?
I don't mind either way. I simply wanted to understand the motivation for not using any existing framework. The above argument is convincing enough, IMO.
Implement SyncE DPLL monitoring for E810-T devices. Poll loop will periodically check the state of the DPLL and cache it in the pf structure. State changes will be logged in the system log.
Cached state can be read using the RTM_GETEECSTATE rtnetlink message.
Signed-off-by: Maciej Machnikowski maciej.machnikowski@intel.com --- drivers/net/ethernet/intel/ice/ice.h | 5 ++ .../net/ethernet/intel/ice/ice_adminq_cmd.h | 34 ++++++++++ drivers/net/ethernet/intel/ice/ice_common.c | 62 +++++++++++++++++++ drivers/net/ethernet/intel/ice/ice_common.h | 4 ++ drivers/net/ethernet/intel/ice/ice_devids.h | 3 + drivers/net/ethernet/intel/ice/ice_main.c | 29 +++++++++ drivers/net/ethernet/intel/ice/ice_ptp.c | 35 +++++++++++ drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 44 +++++++++++++ drivers/net/ethernet/intel/ice/ice_ptp_hw.h | 22 +++++++ 9 files changed, 238 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index eadcb9958346..6fb7e07e8a62 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -508,6 +508,11 @@ struct ice_pf { #define ICE_VF_AGG_NODE_ID_START 65 #define ICE_MAX_VF_AGG_NODES 32 struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES]; + + enum if_eec_state synce_dpll_state; + u8 synce_dpll_pin; + enum if_eec_state ptp_dpll_state; + u8 ptp_dpll_pin; };
struct ice_netdev_priv { diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h index 21b4c7cd6f05..b84da5e9d025 100644 --- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h +++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h @@ -1727,6 +1727,36 @@ struct ice_aqc_add_rdma_qset_data { struct ice_aqc_add_tx_rdma_qset_entry rdma_qsets[]; };
+/* Get CGU DPLL status (direct 0x0C66) */ +struct ice_aqc_get_cgu_dpll_status { + u8 dpll_num; + u8 ref_state; +#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_LOS BIT(0) +#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_SCM BIT(1) +#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_CFM BIT(2) +#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_GST BIT(3) +#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_PFM BIT(4) +#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_ESYNC BIT(6) +#define ICE_AQC_GET_CGU_DPLL_STATUS_FAST_LOCK_EN BIT(7) + __le16 dpll_state; +#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_LOCK BIT(0) +#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO BIT(1) +#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO_READY BIT(2) +#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_FLHIT BIT(5) +#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_PSLHIT BIT(7) +#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SHIFT 8 +#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SEL \ + ICE_M(0x1F, ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SHIFT) +#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_MODE_SHIFT 13 +#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_MODE \ + ICE_M(0x7, ICE_AQC_GET_CGU_DPLL_STATUS_STATE_MODE_SHIFT) + __le32 phase_offset_h; + __le32 phase_offset_l; + u8 eec_mode; + u8 rsvd[1]; + __le16 node_handle; +}; + /* Configure Firmware Logging Command (indirect 0xFF09) * Logging Information Read Response (indirect 0xFF10) * Note: The 0xFF10 command has no input parameters. @@ -1954,6 +1984,7 @@ struct ice_aq_desc { struct ice_aqc_fw_logging fw_logging; struct ice_aqc_get_clear_fw_log get_clear_fw_log; struct ice_aqc_download_pkg download_pkg; + struct ice_aqc_get_cgu_dpll_status get_cgu_dpll_status; struct ice_aqc_driver_shared_params drv_shared_params; struct ice_aqc_set_mac_lb set_mac_lb; struct ice_aqc_alloc_free_res_cmd sw_res_ctrl; @@ -2108,6 +2139,9 @@ enum ice_adminq_opc { ice_aqc_opc_update_pkg = 0x0C42, ice_aqc_opc_get_pkg_info_list = 0x0C43,
+ /* 1588/SyncE commands/events */ + ice_aqc_opc_get_cgu_dpll_status = 0x0C66, + ice_aqc_opc_driver_shared_params = 0x0C90,
/* Standalone Commands/Events */ diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c index 2fb81e359cdf..e7474643a421 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.c +++ b/drivers/net/ethernet/intel/ice/ice_common.c @@ -69,6 +69,31 @@ bool ice_is_e810(struct ice_hw *hw) return hw->mac_type == ICE_MAC_E810; }
+/** + * ice_is_e810t + * @hw: pointer to the hardware structure + * + * returns true if the device is E810T based, false if not. + */ +bool ice_is_e810t(struct ice_hw *hw) +{ + switch (hw->device_id) { + case ICE_DEV_ID_E810C_SFP: + if (hw->subsystem_device_id == ICE_SUBDEV_ID_E810T || + hw->subsystem_device_id == ICE_SUBDEV_ID_E810T2) + return true; + break; + case ICE_DEV_ID_E810C_QSFP: + if (hw->subsystem_device_id == ICE_SUBDEV_ID_E810T2) + return true; + break; + default: + break; + } + + return false; +} + /** * ice_clear_pf_cfg - Clear PF configuration * @hw: pointer to the hardware structure @@ -4520,6 +4545,42 @@ ice_dis_vsi_rdma_qset(struct ice_port_info *pi, u16 count, u32 *qset_teid, return ice_status_to_errno(status); }
+/** + * ice_aq_get_cgu_dpll_status + * @hw: pointer to the HW struct + * @dpll_num: DPLL index + * @ref_state: Reference clock state + * @dpll_state: DPLL state + * @phase_offset: Phase offset in ps + * @eec_mode: EEC_mode + * + * Get CGU DPLL status (0x0C66) + */ +enum ice_status +ice_aq_get_cgu_dpll_status(struct ice_hw *hw, u8 dpll_num, u8 *ref_state, + u16 *dpll_state, u64 *phase_offset, u8 *eec_mode) +{ + struct ice_aqc_get_cgu_dpll_status *cmd; + struct ice_aq_desc desc; + enum ice_status status; + + ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_cgu_dpll_status); + cmd = &desc.params.get_cgu_dpll_status; + cmd->dpll_num = dpll_num; + + status = ice_aq_send_cmd(hw, &desc, NULL, 0, NULL); + if (!status) { + *ref_state = cmd->ref_state; + *dpll_state = le16_to_cpu(cmd->dpll_state); + *phase_offset = le32_to_cpu(cmd->phase_offset_h); + *phase_offset <<= 32; + *phase_offset += le32_to_cpu(cmd->phase_offset_l); + *eec_mode = cmd->eec_mode; + } + + return status; +} + /** * ice_replay_pre_init - replay pre initialization * @hw: pointer to the HW struct @@ -4974,3 +5035,4 @@ bool ice_fw_supports_report_dflt_cfg(struct ice_hw *hw) } return false; } + diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h index fb16070f02e2..ccd76c0cbf2c 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.h +++ b/drivers/net/ethernet/intel/ice/ice_common.h @@ -100,6 +100,7 @@ enum ice_status ice_aq_manage_mac_write(struct ice_hw *hw, const u8 *mac_addr, u8 flags, struct ice_sq_cd *cd); bool ice_is_e810(struct ice_hw *hw); +bool ice_is_e810t(struct ice_hw *hw); enum ice_status ice_clear_pf_cfg(struct ice_hw *hw); enum ice_status ice_aq_set_phy_cfg(struct ice_hw *hw, struct ice_port_info *pi, @@ -156,6 +157,9 @@ ice_cfg_vsi_rdma(struct ice_port_info *pi, u16 vsi_handle, u16 tc_bitmap, int ice_ena_vsi_rdma_qset(struct ice_port_info *pi, u16 vsi_handle, u8 tc, u16 *rdma_qset, u16 num_qsets, u32 *qset_teid); +enum ice_status +ice_aq_get_cgu_dpll_status(struct ice_hw *hw, u8 dpll_num, u8 *ref_state, + u16 *dpll_state, u64 *phase_offset, u8 *eec_mode); int ice_dis_vsi_rdma_qset(struct ice_port_info *pi, u16 count, u32 *qset_teid, u16 *q_id); diff --git a/drivers/net/ethernet/intel/ice/ice_devids.h b/drivers/net/ethernet/intel/ice/ice_devids.h index 9d8194671f6a..e52dbeddb783 100644 --- a/drivers/net/ethernet/intel/ice/ice_devids.h +++ b/drivers/net/ethernet/intel/ice/ice_devids.h @@ -52,4 +52,7 @@ /* Intel(R) Ethernet Connection E822-L 1GbE */ #define ICE_DEV_ID_E822L_SGMII 0x189A
+#define ICE_SUBDEV_ID_E810T 0x000E +#define ICE_SUBDEV_ID_E810T2 0x000F + #endif /* _ICE_DEVIDS_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 0d6c143f6653..62a011f135bd 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5978,6 +5978,34 @@ static void ice_napi_disable_all(struct ice_vsi *vsi) } }
+/** + * ice_get_eec_state - get state of SyncE DPLL + * @netdev: network interface device structure + * @state: state of SyncE DPLL + * @sync_src: port is a source of the sync signal + */ +static int +ice_get_eec_state(struct net_device *netdev, enum if_eec_state *state, + u32 *eec_flags, struct netlink_ext_ack *extack) +{ + struct ice_netdev_priv *np = netdev_priv(netdev); + struct ice_vsi *vsi = np->vsi; + struct ice_pf *pf = vsi->back; + + if (!ice_is_e810t(&pf->hw)) + return -EOPNOTSUPP; + + *state = pf->synce_dpll_state; + + *eec_flags = 0; + if (pf->synce_dpll_pin == REF1P || pf->synce_dpll_pin == REF1N) { + /* Add check if the current PF drives the EEC clock */ + *eec_flags |= EEC_SRC_PORT; + } + + return 0; +} + /** * ice_down - Shutdown the connection * @vsi: The VSI being stopped @@ -7268,4 +7296,5 @@ static const struct net_device_ops ice_netdev_ops = { .ndo_bpf = ice_xdp, .ndo_xdp_xmit = ice_xdp_xmit, .ndo_xsk_wakeup = ice_xsk_wakeup, + .ndo_get_eec_state = ice_get_eec_state, }; diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index 05cc5870e4ef..ccbc2d1cc754 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -1409,6 +1409,36 @@ static void ice_ptp_tx_tstamp_cleanup(struct ice_ptp_tx *tx) } }
+static void ice_handle_cgu_state(struct ice_pf *pf) +{ + enum if_eec_state cgu_state; + u8 pin; + + cgu_state = ice_get_dpll_state(&pf->hw, ICE_CGU_DPLL_SYNCE, &pin); + if (pf->synce_dpll_state != cgu_state) { + pf->synce_dpll_state = cgu_state; + pf->synce_dpll_pin = pin; + + dev_warn(ice_pf_to_dev(pf), + "<DPLL%i> state changed to: %d, pin %d", + ICE_CGU_DPLL_SYNCE, + pf->synce_dpll_state, + pin); + } + + cgu_state = ice_get_dpll_state(&pf->hw, ICE_CGU_DPLL_PTP, &pin); + if (pf->ptp_dpll_state != cgu_state) { + pf->ptp_dpll_state = cgu_state; + pf->ptp_dpll_pin = pin; + + dev_warn(ice_pf_to_dev(pf), + "<DPLL%i> state changed to: %d, pin %d", + ICE_CGU_DPLL_PTP, + pf->ptp_dpll_state, + pin); + } +} + static void ice_ptp_periodic_work(struct kthread_work *work) { struct ice_ptp *ptp = container_of(work, struct ice_ptp, work.work); @@ -1417,6 +1447,10 @@ static void ice_ptp_periodic_work(struct kthread_work *work) if (!test_bit(ICE_FLAG_PTP, pf->flags)) return;
+ if (ice_is_e810t(&pf->hw) && + &pf->hw.func_caps.ts_func_info.src_tmr_owned) + ice_handle_cgu_state(pf); + ice_ptp_update_cached_phctime(pf);
ice_ptp_tx_tstamp_cleanup(&pf->ptp.port.tx); @@ -1598,3 +1632,4 @@ void ice_ptp_release(struct ice_pf *pf)
dev_info(ice_pf_to_dev(pf), "Removed PTP clock\n"); } + diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c index 3eca0e4eab0b..df09f176512b 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c @@ -375,6 +375,50 @@ static int ice_ptp_port_cmd_e810(struct ice_hw *hw, enum ice_ptp_tmr_cmd cmd) return 0; }
+/** + * ice_get_dpll_state - get the state of the DPLL + * @hw: pointer to the hw struct + * @dpll_idx: Index of internal DPLL unit + * @pin: pointer to a buffer for returning currently active pin + * + * This function will read the state of the DPLL(dpll_idx). If optional + * parameter pin is given it'll be used to retrieve currently active pin. + * + * Return: state of the DPLL + */ +enum if_eec_state +ice_get_dpll_state(struct ice_hw *hw, u8 dpll_idx, u8 *pin) +{ + enum ice_status status; + u64 phase_offset; + u16 dpll_state; + u8 ref_state; + u8 eec_mode; + + if (dpll_idx >= ICE_CGU_DPLL_MAX) + return IF_EEC_STATE_INVALID; + + status = ice_aq_get_cgu_dpll_status(hw, dpll_idx, &ref_state, + &dpll_state, &phase_offset, + &eec_mode); + if (status) + return IF_EEC_STATE_INVALID; + + if (pin) { + /* current ref pin in dpll_state_refsel_status_X register */ + *pin = (dpll_state & + ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SEL) >> + ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SHIFT; + } + + if (dpll_state & ICE_AQC_GET_CGU_DPLL_STATUS_STATE_LOCK) + return IF_EEC_STATE_LOCKED; + else if (dpll_state & ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO) + return IF_EEC_STATE_HOLDOVER; + + return IF_EEC_STATE_FREERUN; +} + /* Device agnostic functions * * The following functions implement useful behavior to hide the differences diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h index 55a414e87018..cc5dc79fc926 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h @@ -30,6 +30,8 @@ int ice_clear_phy_tstamp(struct ice_hw *hw, u8 block, u8 idx);
/* E810 family functions */ int ice_ptp_init_phy_e810(struct ice_hw *hw); +enum if_eec_state +ice_get_dpll_state(struct ice_hw *hw, u8 dpll_idx, u8 *pin);
#define PFTSYN_SEM_BYTES 4
@@ -76,4 +78,24 @@ int ice_ptp_init_phy_e810(struct ice_hw *hw); #define LOW_TX_MEMORY_BANK_START 0x03090000 #define HIGH_TX_MEMORY_BANK_START 0x03090004
+enum ice_e810t_cgu_dpll { + ICE_CGU_DPLL_SYNCE, + ICE_CGU_DPLL_PTP, + ICE_CGU_DPLL_MAX +}; + +enum ice_e810t_cgu_pins { + REF0P, + REF0N, + REF1P, + REF1N, + REF2P, + REF2N, + REF3P, + REF3N, + REF4P, + REF4N, + NUM_E810T_CGU_PINS +}; + #endif /* _ICE_PTP_HW_H_ */
On Fri, 3 Sep 2021 17:14:36 +0200 Maciej Machnikowski wrote:
Implement SyncE DPLL monitoring for E810-T devices. Poll loop will periodically check the state of the DPLL and cache it in the pf structure. State changes will be logged in the system log.
Cached state can be read using the RTM_GETEECSTATE rtnetlink message.
Signed-off-by: Maciej Machnikowski maciej.machnikowski@intel.com
kdoc issues here:
drivers/net/ethernet/intel/ice/ice_main.c:5990: warning: Function parameter or member 'eec_flags' not described in 'ice_get_eec_state' drivers/net/ethernet/intel/ice/ice_main.c:5990: warning: Function parameter or member 'extack' not described in 'ice_get_eec_state' drivers/net/ethernet/intel/ice/ice_main.c:5990: warning: Excess function parameter 'sync_src' description in 'ice_get_eec_state'
./scripts/kernel-doc -none and/or W=1 is your friend.
On Fri, Sep 03, 2021 at 05:14:36PM +0200, Maciej Machnikowski wrote:
Implement SyncE DPLL monitoring for E810-T devices. Poll loop will periodically check the state of the DPLL and cache it in the pf structure. State changes will be logged in the system log.
Cached state can be read using the RTM_GETEECSTATE rtnetlink message.
This seems sub-optimal. My understanding is that this information is of importance to the user space process that takes care of the ESMC protocol. It would probably want to receive a notification when the state of the DPLL/EEC changes as opposed to polling the kernel or poking into its log.
So I think that whatever netlink-based interface we agree on to represent the EEC (devlink or something else), should have the ability to send a notification when the state changes.
linux-kselftest-mirror@lists.linaro.org