This is a note to let you know that I've just added the patch titled
perf evsel: Fix swap for samples with raw data
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
perf-evsel-fix-swap-for-samples-with-raw-data.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 10:16:32 CEST 2018
From: Jiri Olsa <jolsa(a)kernel.org>
Date: Wed, 29 Nov 2017 19:43:46 +0100
Subject: perf evsel: Fix swap for samples with raw data
From: Jiri Olsa <jolsa(a)kernel.org>
[ Upstream commit f9d8adb345d7adbb2d3431eea73beb89c8d6d612 ]
When we detect a different endianity we swap event before processing.
It's tricky for samples because we have no idea what's inside. We treat
it as an array of u64s, swap them and later on we swap back parts which
are different.
We mangle this way also the tracepoint raw data, which ends up in report
showing wrong data:
1.95% comm=Q^B pid=29285 prio=16777216 target_cpu=000
1.67% comm=l^B pid=0 prio=16777216 target_cpu=000
Luckily the traceevent library handles the endianity by itself (thank
you Steven!), so we can pass the RAW data directly in the other
endianity.
2.51% comm=beah-rhts-task pid=1175 prio=120 target_cpu=002
2.23% comm=kworker/0:0 pid=11566 prio=120 target_cpu=000
The fix is basically to swap back the raw data if different endianity is
detected.
Signed-off-by: Jiri Olsa <jolsa(a)kernel.org>
Cc: David Ahern <dsahern(a)gmail.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Link: http://lkml.kernel.org/r/20171129184346.3656-1-jolsa@kernel.org
[ Add util/memswap.c to python-ext-sources to link missing mem_bswap_64() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/perf/util/evsel.c | 20 +++++++++++++++++---
tools/perf/util/python-ext-sources | 1 +
2 files changed, 18 insertions(+), 3 deletions(-)
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -36,6 +36,7 @@
#include "debug.h"
#include "trace-event.h"
#include "stat.h"
+#include "memswap.h"
#include "util/parse-branch-options.h"
#include "sane_ctype.h"
@@ -2120,14 +2121,27 @@ int perf_evsel__parse_sample(struct perf
if (type & PERF_SAMPLE_RAW) {
OVERFLOW_CHECK_u64(array);
u.val64 = *array;
- if (WARN_ONCE(swapped,
- "Endianness of raw data not corrected!\n")) {
- /* undo swap of u64, then swap on individual u32s */
+
+ /*
+ * Undo swap of u64, then swap on individual u32s,
+ * get the size of the raw area and undo all of the
+ * swap. The pevent interface handles endianity by
+ * itself.
+ */
+ if (swapped) {
u.val64 = bswap_64(u.val64);
u.val32[0] = bswap_32(u.val32[0]);
u.val32[1] = bswap_32(u.val32[1]);
}
data->raw_size = u.val32[0];
+
+ /*
+ * The raw data is aligned on 64bits including the
+ * u32 size, so it's safe to use mem_bswap_64.
+ */
+ if (swapped)
+ mem_bswap_64((void *) array, data->raw_size);
+
array = (void *)array + sizeof(u32);
OVERFLOW_CHECK(array, data->raw_size, max_size);
--- a/tools/perf/util/python-ext-sources
+++ b/tools/perf/util/python-ext-sources
@@ -10,6 +10,7 @@ util/ctype.c
util/evlist.c
util/evsel.c
util/cpumap.c
+util/memswap.c
util/mmap.c
util/namespaces.c
../lib/bitmap.c
Patches currently in stable-queue which might be from jolsa(a)kernel.org are
queue-4.15/perf-evsel-fix-swap-for-samples-with-raw-data.patch
queue-4.15/perf-tools-fix-copyfile_offset-update-of-output-offset.patch
queue-4.15/perf-report-fix-a-no-annotate-browser-displayed-issue.patch
This is a note to let you know that I've just added the patch titled
perf evsel: Enable ignore_missing_thread for pid option
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
perf-evsel-enable-ignore_missing_thread-for-pid-option.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 10:16:32 CEST 2018
From: Mengting Zhang <zhangmengting(a)huawei.com>
Date: Wed, 13 Dec 2017 15:01:53 +0800
Subject: perf evsel: Enable ignore_missing_thread for pid option
From: Mengting Zhang <zhangmengting(a)huawei.com>
[ Upstream commit ca8000684ec4e66f965e1f9547a3c6cb834154ca ]
While monitoring a multithread process with pid option, perf sometimes
may return sys_perf_event_open failure with 3(No such process) if any of
the process's threads die before we open the event. However, we want
perf continue monitoring the remaining threads and do not exit with
error.
Here, the patch enables perf_evsel::ignore_missing_thread for -p option
to ignore complete failure if any of threads die before we open the event.
But it may still return sys_perf_event_open failure with 22(Invalid) if we
monitors several event groups.
sys_perf_event_open: pid 28960 cpu 40 group_fd 118202 flags 0x8
sys_perf_event_open: pid 28961 cpu 40 group_fd 118203 flags 0x8
WARNING: Ignored open failure for pid 28962
sys_perf_event_open: pid 28962 cpu 40 group_fd [118203] flags 0x8
sys_perf_event_open failed, error -22
That is because when we ignore a missing thread, we change the thread_idx
without dealing with its fds, FD(evsel, cpu, thread). Then get_group_fd()
may return a wrong group_fd for the next thread and sys_perf_event_open()
return with 22.
sys_perf_event_open(){
...
if (group_fd != -1)
perf_fget_light()//to get corresponding group_leader by group_fd
...
if (group_leader)
if (group_leader->ctx->task != ctx->task)//should on the same task
goto err_context
...
}
This patch also fixes this bug by introducing perf_evsel__remove_fd() and
update_fds to allow removing fds for the missing thread.
Changes since v1:
- Change group_fd__remove() into a more genetic way without changing code logic
- Remove redundant condition
Changes since v2:
- Use a proper function name and add some comment.
- Multiline comment style fixes.
Committer testing:
Before this patch the recently added 'perf stat --per-thread' for system
wide counting would race while enumerating all threads using /proc:
[root@jouet ~]# perf stat --per-thread
failed to parse CPUs map: No such file or directory
Usage: perf stat [<options>] [<command>]
-C, --cpu <cpu> list of cpus to monitor in system-wide
-a, --all-cpus system-wide collection from all CPUs
[root@jouet ~]# perf stat --per-thread
failed to parse CPUs map: No such file or directory
Usage: perf stat [<options>] [<command>]
-C, --cpu <cpu> list of cpus to monitor in system-wide
-a, --all-cpus system-wide collection from all CPUs
[root@jouet ~]#
When, say, the kernel was being built, so lots of shortlived threads,
after this patch this doesn't happen.
Signed-off-by: Mengting Zhang <zhangmengting(a)huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Acked-by: Jiri Olsa <jolsa(a)redhat.com>
Cc: Cheng Jian <cj.chengjian(a)huawei.com>
Cc: Li Bin <huawei.libin(a)huawei.com>
Cc: Wang Nan <wangnan0(a)huawei.com>
Link: http://lkml.kernel.org/r/1513148513-6974-1-git-send-email-zhangmengting@hua…
[ Remove one use 'evlist' alias variable ]
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/perf/builtin-record.c | 4 +--
tools/perf/util/evsel.c | 47 ++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 47 insertions(+), 4 deletions(-)
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1781,8 +1781,8 @@ int cmd_record(int argc, const char **ar
goto out;
}
- /* Enable ignoring missing threads when -u option is defined. */
- rec->opts.ignore_missing_thread = rec->opts.target.uid != UINT_MAX;
+ /* Enable ignoring missing threads when -u/-p option is defined. */
+ rec->opts.ignore_missing_thread = rec->opts.target.uid != UINT_MAX || rec->opts.target.pid;
err = -ENOMEM;
if (perf_evlist__create_maps(rec->evlist, &rec->opts.target) < 0)
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1597,10 +1597,46 @@ static int __open_attr__fprintf(FILE *fp
return fprintf(fp, " %-32s %s\n", name, val);
}
+static void perf_evsel__remove_fd(struct perf_evsel *pos,
+ int nr_cpus, int nr_threads,
+ int thread_idx)
+{
+ for (int cpu = 0; cpu < nr_cpus; cpu++)
+ for (int thread = thread_idx; thread < nr_threads - 1; thread++)
+ FD(pos, cpu, thread) = FD(pos, cpu, thread + 1);
+}
+
+static int update_fds(struct perf_evsel *evsel,
+ int nr_cpus, int cpu_idx,
+ int nr_threads, int thread_idx)
+{
+ struct perf_evsel *pos;
+
+ if (cpu_idx >= nr_cpus || thread_idx >= nr_threads)
+ return -EINVAL;
+
+ evlist__for_each_entry(evsel->evlist, pos) {
+ nr_cpus = pos != evsel ? nr_cpus : cpu_idx;
+
+ perf_evsel__remove_fd(pos, nr_cpus, nr_threads, thread_idx);
+
+ /*
+ * Since fds for next evsel has not been created,
+ * there is no need to iterate whole event list.
+ */
+ if (pos == evsel)
+ break;
+ }
+ return 0;
+}
+
static bool ignore_missing_thread(struct perf_evsel *evsel,
+ int nr_cpus, int cpu,
struct thread_map *threads,
int thread, int err)
{
+ pid_t ignore_pid = thread_map__pid(threads, thread);
+
if (!evsel->ignore_missing_thread)
return false;
@@ -1616,11 +1652,18 @@ static bool ignore_missing_thread(struct
if (threads->nr == 1)
return false;
+ /*
+ * We should remove fd for missing_thread first
+ * because thread_map__remove() will decrease threads->nr.
+ */
+ if (update_fds(evsel, nr_cpus, cpu, threads->nr, thread))
+ return false;
+
if (thread_map__remove(threads, thread))
return false;
pr_warning("WARNING: Ignored open failure for pid %d\n",
- thread_map__pid(threads, thread));
+ ignore_pid);
return true;
}
@@ -1725,7 +1768,7 @@ retry_open:
if (fd < 0) {
err = -errno;
- if (ignore_missing_thread(evsel, threads, thread, err)) {
+ if (ignore_missing_thread(evsel, cpus->nr, cpu, threads, thread, err)) {
/*
* We just removed 1 thread, so take a step
* back on thread index and lower the upper
Patches currently in stable-queue which might be from zhangmengting(a)huawei.com are
queue-4.15/perf-evsel-enable-ignore_missing_thread-for-pid-option.patch
This is a note to let you know that I've just added the patch titled
nvme_fcloop: fix abort race condition
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nvme_fcloop-fix-abort-race-condition.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 10:16:32 CEST 2018
From: James Smart <jsmart2021(a)gmail.com>
Date: Wed, 29 Nov 2017 16:47:30 -0800
Subject: nvme_fcloop: fix abort race condition
From: James Smart <jsmart2021(a)gmail.com>
[ Upstream commit 278e096063f1914fccfc77a617be9fc8dbb31b0e ]
A test case revealed a race condition of an i/o completing on a thread
parallel to the delete_association generating the aborts for the
outstanding ios on the controller. The i/o completion was freeing the
target fcloop context, thus the abort task referenced the just-freed
memory.
Correct by clearing the target/initiator cross pointers in the io
completion and abort tasks before calling the callbacks. On aborts
that detect already finished io's, ensure the complete context is
called.
Signed-off-by: James Smart <james.smart(a)broadcom.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvme/target/fcloop.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -374,6 +374,7 @@ fcloop_tgt_fcprqst_done_work(struct work
spin_lock(&tfcp_req->reqlock);
fcpreq = tfcp_req->fcpreq;
+ tfcp_req->fcpreq = NULL;
spin_unlock(&tfcp_req->reqlock);
if (tport->remoteport && fcpreq) {
@@ -615,11 +616,7 @@ fcloop_fcp_abort(struct nvme_fc_local_po
if (!tfcp_req)
/* abort has already been called */
- return;
-
- if (rport->targetport)
- nvmet_fc_rcv_fcp_abort(rport->targetport,
- &tfcp_req->tgt_fcp_req);
+ goto finish;
/* break initiator/target relationship for io */
spin_lock(&tfcp_req->reqlock);
@@ -627,6 +624,11 @@ fcloop_fcp_abort(struct nvme_fc_local_po
tfcp_req->fcpreq = NULL;
spin_unlock(&tfcp_req->reqlock);
+ if (rport->targetport)
+ nvmet_fc_rcv_fcp_abort(rport->targetport,
+ &tfcp_req->tgt_fcp_req);
+
+finish:
/* post the aborted io completion */
fcpreq->status = -ECANCELED;
schedule_work(&inireq->iniwork);
Patches currently in stable-queue which might be from jsmart2021(a)gmail.com are
queue-4.15/nvme_fcloop-fix-abort-race-condition.patch
queue-4.15/nvme_fcloop-disassocate-local-port-structs.patch
This is a note to let you know that I've just added the patch titled
nvme_fcloop: disassocate local port structs
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nvme_fcloop-disassocate-local-port-structs.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 10:16:32 CEST 2018
From: James Smart <jsmart2021(a)gmail.com>
Date: Wed, 29 Nov 2017 16:47:31 -0800
Subject: nvme_fcloop: disassocate local port structs
From: James Smart <jsmart2021(a)gmail.com>
[ Upstream commit 6fda20283e55b9d288cd56822ce39fc8e64f2208 ]
The current fcloop driver gets its lport structure from the private
area co-allocated with the fc_localport. All is fine except the
teardown path, which wants to wait on the completion, which is marked
complete by the delete_localport callback performed after
unregister_localport. The issue is, the nvme_fc transport frees the
localport structure immediately after delete_localport is called,
meaning the original routine is trying to wait on a complete that
was just freed.
Change such that a lport struct is allocated coincident with the
addition and registration of a localport. The private area of the
localport now contains just a backpointer to the real lport struct.
Now, the completion can be waited for, and after completing, the
new structure can be kfree'd.
Signed-off-by: James Smart <james.smart(a)broadcom.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvme/target/fcloop.c | 35 +++++++++++++++++++++++++----------
1 file changed, 25 insertions(+), 10 deletions(-)
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -204,6 +204,10 @@ struct fcloop_lport {
struct completion unreg_done;
};
+struct fcloop_lport_priv {
+ struct fcloop_lport *lport;
+};
+
struct fcloop_rport {
struct nvme_fc_remote_port *remoteport;
struct nvmet_fc_target_port *targetport;
@@ -657,7 +661,8 @@ fcloop_nport_get(struct fcloop_nport *np
static void
fcloop_localport_delete(struct nvme_fc_local_port *localport)
{
- struct fcloop_lport *lport = localport->private;
+ struct fcloop_lport_priv *lport_priv = localport->private;
+ struct fcloop_lport *lport = lport_priv->lport;
/* release any threads waiting for the unreg to complete */
complete(&lport->unreg_done);
@@ -697,7 +702,7 @@ static struct nvme_fc_port_template fcte
.max_dif_sgl_segments = FCLOOP_SGL_SEGS,
.dma_boundary = FCLOOP_DMABOUND_4G,
/* sizes of additional private data for data structures */
- .local_priv_sz = sizeof(struct fcloop_lport),
+ .local_priv_sz = sizeof(struct fcloop_lport_priv),
.remote_priv_sz = sizeof(struct fcloop_rport),
.lsrqst_priv_sz = sizeof(struct fcloop_lsreq),
.fcprqst_priv_sz = sizeof(struct fcloop_ini_fcpreq),
@@ -728,11 +733,17 @@ fcloop_create_local_port(struct device *
struct fcloop_ctrl_options *opts;
struct nvme_fc_local_port *localport;
struct fcloop_lport *lport;
- int ret;
+ struct fcloop_lport_priv *lport_priv;
+ unsigned long flags;
+ int ret = -ENOMEM;
+
+ lport = kzalloc(sizeof(*lport), GFP_KERNEL);
+ if (!lport)
+ return -ENOMEM;
opts = kzalloc(sizeof(*opts), GFP_KERNEL);
if (!opts)
- return -ENOMEM;
+ goto out_free_lport;
ret = fcloop_parse_options(opts, buf);
if (ret)
@@ -752,23 +763,25 @@ fcloop_create_local_port(struct device *
ret = nvme_fc_register_localport(&pinfo, &fctemplate, NULL, &localport);
if (!ret) {
- unsigned long flags;
-
/* success */
- lport = localport->private;
+ lport_priv = localport->private;
+ lport_priv->lport = lport;
+
lport->localport = localport;
INIT_LIST_HEAD(&lport->lport_list);
spin_lock_irqsave(&fcloop_lock, flags);
list_add_tail(&lport->lport_list, &fcloop_lports);
spin_unlock_irqrestore(&fcloop_lock, flags);
-
- /* mark all of the input buffer consumed */
- ret = count;
}
out_free_opts:
kfree(opts);
+out_free_lport:
+ /* free only if we're going to fail */
+ if (ret)
+ kfree(lport);
+
return ret ? ret : count;
}
@@ -790,6 +803,8 @@ __wait_localport_unreg(struct fcloop_lpo
wait_for_completion(&lport->unreg_done);
+ kfree(lport);
+
return ret;
}
Patches currently in stable-queue which might be from jsmart2021(a)gmail.com are
queue-4.15/nvme_fcloop-fix-abort-race-condition.patch
queue-4.15/nvme_fcloop-disassocate-local-port-structs.patch
This is a note to let you know that I've just added the patch titled
net/mlx5e: IPoIB, Use correct timestamp in child receive flow
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-mlx5e-ipoib-use-correct-timestamp-in-child-receive-flow.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 10:16:32 CEST 2018
From: Feras Daoud <ferasda(a)mellanox.com>
Date: Tue, 31 Oct 2017 14:57:27 +0200
Subject: net/mlx5e: IPoIB, Use correct timestamp in child receive flow
From: Feras Daoud <ferasda(a)mellanox.com>
[ Upstream commit 36e564b76f1862914ad32c35bab433e07da2ebf8 ]
The current implementation takes the child timestamp object from
the parent since the rq in mlx5i_complete_rx_cqe belongs to the parent.
This change fixes the issue by taking the correct timestamp.
Fixes: 7e7f4780c340 ("net/mlx5e: IPoIB, Use hash-table to map between QPN to child netdev")
Signed-off-by: Feras Daoud <ferasda(a)mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm(a)mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1196,7 +1196,9 @@ static inline void mlx5i_complete_rx_cqe
u32 cqe_bcnt,
struct sk_buff *skb)
{
+ struct hwtstamp_config *tstamp;
struct net_device *netdev;
+ struct mlx5e_priv *priv;
char *pseudo_header;
u32 qpn;
u8 *dgid;
@@ -1215,6 +1217,9 @@ static inline void mlx5i_complete_rx_cqe
return;
}
+ priv = mlx5i_epriv(netdev);
+ tstamp = &priv->tstamp;
+
g = (be32_to_cpu(cqe->flags_rqpn) >> 28) & 3;
dgid = skb->data + MLX5_IB_GRH_DGID_OFFSET;
if ((!g) || dgid[0] != 0xff)
@@ -1235,7 +1240,7 @@ static inline void mlx5i_complete_rx_cqe
skb->ip_summed = CHECKSUM_COMPLETE;
skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
- if (unlikely(mlx5e_rx_hw_stamp(rq->tstamp)))
+ if (unlikely(mlx5e_rx_hw_stamp(tstamp)))
skb_hwtstamps(skb)->hwtstamp =
mlx5_timecounter_cyc2time(rq->clock, get_cqe_ts(cqe));
Patches currently in stable-queue which might be from ferasda(a)mellanox.com are
queue-4.15/net-mlx5e-ipoib-use-correct-timestamp-in-child-receive-flow.patch
This is a note to let you know that I've just added the patch titled
net_sch: red: Fix the new offload indication
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net_sch-red-fix-the-new-offload-indication.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 10:16:32 CEST 2018
From: Nogah Frankel <nogahf(a)mellanox.com>
Date: Mon, 25 Dec 2017 10:51:41 +0200
Subject: net_sch: red: Fix the new offload indication
From: Nogah Frankel <nogahf(a)mellanox.com>
[ Upstream commit 8234af2db3614d78b49e77ef46ea8cfab6586568 ]
Update the offload flag, TCQ_F_OFFLOADED, in each dump call (and ignore
the offloading function return value in relation to this flag).
This is done because a qdisc is being initialized, and therefore offloaded
before being grafted. Since the ability of the driver to offload the qdisc
depends on its location, a qdisc can be offloaded and un-offloaded by graft
calls, that doesn't effect the qdisc itself.
Fixes: 428a68af3a7c ("net: sched: Move to new offload indication in RED"
Signed-off-by: Nogah Frankel <nogahf(a)mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm(a)mellanox.com>
Acked-by: Jiri Pirko <jiri(a)mellanox.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/sched/sch_red.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -157,7 +157,6 @@ static int red_offload(struct Qdisc *sch
.handle = sch->handle,
.parent = sch->parent,
};
- int err;
if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
return -EOPNOTSUPP;
@@ -172,14 +171,7 @@ static int red_offload(struct Qdisc *sch
opt.command = TC_RED_DESTROY;
}
- err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED, &opt);
-
- if (!err && enable)
- sch->flags |= TCQ_F_OFFLOADED;
- else
- sch->flags &= ~TCQ_F_OFFLOADED;
-
- return err;
+ return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED, &opt);
}
static void red_destroy(struct Qdisc *sch)
@@ -294,12 +286,22 @@ static int red_dump_offload_stats(struct
.stats.qstats = &sch->qstats,
},
};
+ int err;
+
+ sch->flags &= ~TCQ_F_OFFLOADED;
- if (!(sch->flags & TCQ_F_OFFLOADED))
+ if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
+ return 0;
+
+ err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED,
+ &hw_stats);
+ if (err == -EOPNOTSUPP)
return 0;
- return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED,
- &hw_stats);
+ if (!err)
+ sch->flags |= TCQ_F_OFFLOADED;
+
+ return err;
}
static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
Patches currently in stable-queue which might be from nogahf(a)mellanox.com are
queue-4.15/net_sch-red-fix-the-new-offload-indication.patch
This is a note to let you know that I've just added the patch titled
net/mlx4_en: Change default QoS settings
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-mlx4_en-change-default-qos-settings.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 10:16:32 CEST 2018
From: Moni Shoua <monis(a)mellanox.com>
Date: Thu, 28 Dec 2017 16:26:11 +0200
Subject: net/mlx4_en: Change default QoS settings
From: Moni Shoua <monis(a)mellanox.com>
[ Upstream commit a42b63c1ac1986f17f71bc91a6b0aaa14d4dae71 ]
Change the default mapping between TC and TCG as follows:
Prio | TC/TCG
| from to
| (set by FW) (set by SW)
---------+-----------------------------------
0 | 0/0 0/7
1 | 1/0 0/6
2 | 2/0 0/5
3 | 3/0 0/4
4 | 4/0 0/3
5 | 5/0 0/2
6 | 6/0 0/1
7 | 7/0 0/0
These new settings cause that a pause frame for any prio stops
traffic for all prios.
Fixes: 564c274c3df0 ("net/mlx4_en: DCB QoS support")
Signed-off-by: Moni Shoua <monis(a)mellanox.com>
Signed-off-by: Maor Gottlieb <maorg(a)mellanox.com>
Signed-off-by: Tariq Toukan <tariqt(a)mellanox.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c | 5 +++++
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 7 +++++++
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 +
3 files changed, 13 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
@@ -310,6 +310,7 @@ static int mlx4_en_ets_validate(struct m
}
switch (ets->tc_tsa[i]) {
+ case IEEE_8021QAZ_TSA_VENDOR:
case IEEE_8021QAZ_TSA_STRICT:
break;
case IEEE_8021QAZ_TSA_ETS:
@@ -347,6 +348,10 @@ static int mlx4_en_config_port_scheduler
/* higher TC means higher priority => lower pg */
for (i = IEEE_8021QAZ_MAX_TCS - 1; i >= 0; i--) {
switch (ets->tc_tsa[i]) {
+ case IEEE_8021QAZ_TSA_VENDOR:
+ pg[i] = MLX4_EN_TC_VENDOR;
+ tc_tx_bw[i] = MLX4_EN_BW_MAX;
+ break;
case IEEE_8021QAZ_TSA_STRICT:
pg[i] = num_strict++;
tc_tx_bw[i] = MLX4_EN_BW_MAX;
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -3336,6 +3336,13 @@ int mlx4_en_init_netdev(struct mlx4_en_d
priv->msg_enable = MLX4_EN_MSG_LEVEL;
#ifdef CONFIG_MLX4_EN_DCB
if (!mlx4_is_slave(priv->mdev->dev)) {
+ u8 prio;
+
+ for (prio = 0; prio < IEEE_8021QAZ_MAX_TCS; ++prio) {
+ priv->ets.prio_tc[prio] = prio;
+ priv->ets.tc_tsa[prio] = IEEE_8021QAZ_TSA_VENDOR;
+ }
+
priv->dcbx_cap = DCB_CAP_DCBX_VER_CEE | DCB_CAP_DCBX_HOST |
DCB_CAP_DCBX_VER_IEEE;
priv->flags |= MLX4_EN_DCB_ENABLED;
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -479,6 +479,7 @@ struct mlx4_en_frag_info {
#define MLX4_EN_BW_MIN 1
#define MLX4_EN_BW_MAX 100 /* Utilize 100% of the line */
+#define MLX4_EN_TC_VENDOR 0
#define MLX4_EN_TC_ETS 7
enum dcb_pfc_type {
Patches currently in stable-queue which might be from monis(a)mellanox.com are
queue-4.15/net-mlx4_en-change-default-qos-settings.patch
This is a note to let you know that I've just added the patch titled
net/mlx5: Fix race for multiple RoCE enable
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-mlx5-fix-race-for-multiple-roce-enable.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 10:16:32 CEST 2018
From: Daniel Jurgens <danielj(a)mellanox.com>
Date: Thu, 4 Jan 2018 17:25:31 +0200
Subject: net/mlx5: Fix race for multiple RoCE enable
From: Daniel Jurgens <danielj(a)mellanox.com>
[ Upstream commit 734dc065fc41f6143ff88225aa5d335cb1e0f6aa ]
There are two potential problems with the existing implementation.
1. Enable and disable can race after the atomic operations.
2. If a command fails the refcount is left in an inconsistent state.
Introduce a lock and perform error checking.
Fixes: a6f7d2aff623 ("net/mlx5: Add support for multiple RoCE enable")
Signed-off-by: Daniel Jurgens <danielj(a)mellanox.com>
Reviewed-by: Parav Pandit <parav(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/mellanox/mlx5/core/vport.c | 33 +++++++++++++++++++-----
include/linux/mlx5/driver.h | 2 -
2 files changed, 28 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -36,6 +36,9 @@
#include <linux/mlx5/vport.h>
#include "mlx5_core.h"
+/* Mutex to hold while enabling or disabling RoCE */
+static DEFINE_MUTEX(mlx5_roce_en_lock);
+
static int _mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod,
u16 vport, u32 *out, int outlen)
{
@@ -998,17 +1001,35 @@ static int mlx5_nic_vport_update_roce_st
int mlx5_nic_vport_enable_roce(struct mlx5_core_dev *mdev)
{
- if (atomic_inc_return(&mdev->roce.roce_en) != 1)
- return 0;
- return mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_ENABLED);
+ int err = 0;
+
+ mutex_lock(&mlx5_roce_en_lock);
+ if (!mdev->roce.roce_en)
+ err = mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_ENABLED);
+
+ if (!err)
+ mdev->roce.roce_en++;
+ mutex_unlock(&mlx5_roce_en_lock);
+
+ return err;
}
EXPORT_SYMBOL_GPL(mlx5_nic_vport_enable_roce);
int mlx5_nic_vport_disable_roce(struct mlx5_core_dev *mdev)
{
- if (atomic_dec_return(&mdev->roce.roce_en) != 0)
- return 0;
- return mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_DISABLED);
+ int err = 0;
+
+ mutex_lock(&mlx5_roce_en_lock);
+ if (mdev->roce.roce_en) {
+ mdev->roce.roce_en--;
+ if (mdev->roce.roce_en == 0)
+ err = mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_DISABLED);
+
+ if (err)
+ mdev->roce.roce_en++;
+ }
+ mutex_unlock(&mlx5_roce_en_lock);
+ return err;
}
EXPORT_SYMBOL_GPL(mlx5_nic_vport_disable_roce);
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -826,7 +826,7 @@ struct mlx5_core_dev {
struct mlx5e_resources mlx5e_res;
struct {
struct mlx5_rsvd_gids reserved_gids;
- atomic_t roce_en;
+ u32 roce_en;
} roce;
#ifdef CONFIG_MLX5_FPGA
struct mlx5_fpga_device *fpga;
Patches currently in stable-queue which might be from danielj(a)mellanox.com are
queue-4.15/net-mlx5-fix-race-for-multiple-roce-enable.patch
This is a note to let you know that I've just added the patch titled
net: hns3: fix for changing MTU
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-hns3-fix-for-changing-mtu.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 10:16:32 CEST 2018
From: Fuyun Liang <liangfuyun1(a)huawei.com>
Date: Fri, 5 Jan 2018 18:18:20 +0800
Subject: net: hns3: fix for changing MTU
From: Fuyun Liang <liangfuyun1(a)huawei.com>
[ Upstream commit 5bad95a1e55f4d5bb41e130db859d57eaf1b1549 ]
when changing MTU, The new MTU must need to be set to netdevice.
Fixes: a8e8b7ff3517 ("net: hns3: Add support to change MTU in HNS3 hardware")
Signed-off-by: Fuyun Liang <liangfuyun1(a)huawei.com>
Signed-off-by: Peng Li <lipeng321(a)huawei.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 2 ++
1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
@@ -1324,6 +1324,8 @@ static int hns3_nic_change_mtu(struct ne
return ret;
}
+ netdev->mtu = new_mtu;
+
/* if the netdev was running earlier, bring it up again */
if (if_running && hns3_nic_net_open(netdev))
ret = -EINVAL;
Patches currently in stable-queue which might be from liangfuyun1(a)huawei.com are
queue-4.15/net-hns3-fix-for-getting-auto-negotiation-state-in-hclge_get_autoneg.patch
queue-4.15/net-hns3-add-asym-pause-support-to-phy-default-features.patch
queue-4.15/net-hns3-fix-for-changing-mtu.patch
This is a note to let you know that I've just added the patch titled
net: hns3: free the ring_data structrue when change tqps
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-hns3-free-the-ring_data-structrue-when-change-tqps.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 10:16:32 CEST 2018
From: Peng Li <lipeng321(a)huawei.com>
Date: Fri, 22 Dec 2017 12:21:43 +0800
Subject: net: hns3: free the ring_data structrue when change tqps
From: Peng Li <lipeng321(a)huawei.com>
[ Upstream commit 99fdf6b1cadf41bb253408589788f025027274f3 ]
This patch fixes a memory leak problems in change tqps process,
the function hns3_uninit_all_ring and hns3_init_all_ring
may be called many times.
Signed-off-by: Peng Li <lipeng321(a)huawei.com>
Signed-off-by: Mingguang Qu <qumingguang(a)huawei.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
@@ -2785,8 +2785,12 @@ int hns3_uninit_all_ring(struct hns3_nic
h->ae_algo->ops->reset_queue(h, i);
hns3_fini_ring(priv->ring_data[i].ring);
+ devm_kfree(priv->dev, priv->ring_data[i].ring);
hns3_fini_ring(priv->ring_data[i + h->kinfo.num_tqps].ring);
+ devm_kfree(priv->dev,
+ priv->ring_data[i + h->kinfo.num_tqps].ring);
}
+ devm_kfree(priv->dev, priv->ring_data);
return 0;
}
Patches currently in stable-queue which might be from lipeng321(a)huawei.com are
queue-4.15/net-hns3-fix-for-getting-auto-negotiation-state-in-hclge_get_autoneg.patch
queue-4.15/net-hns3-free-the-ring_data-structrue-when-change-tqps.patch
queue-4.15/net-hns3-fix-a-loop-index-error-of-tqp-statistics-query.patch
queue-4.15/net-hns3-fix-an-error-macro-definition-of-hns3_tqp_stat.patch
queue-4.15/net-hns3-add-asym-pause-support-to-phy-default-features.patch
queue-4.15/net-hns3-fix-an-error-of-total-drop-packet-statistics.patch
queue-4.15/net-hns3-fix-for-changing-mtu.patch