[v1] net/e1000: do not update link status in secondary process

Message ID 1720783847-12292-1-git-send-email-junwang01@cestc.cn (mailing list archive)
State Accepted
Delegated to: Bruce Richardson
Headers
Series [v1] net/e1000: do not update link status in secondary process |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/intel-Functional success Functional PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS

Commit Message

Jun Wang July 12, 2024, 11:30 a.m. UTC
The code to update link status is not safe in secondary process.
If called from secondary it will crash, example from dumpcap:
    eth_em_link_update

Signed-off-by: Jun Wang <junwang01@cestc.cn>
---
 drivers/net/e1000/em_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Stephen Hemminger July 12, 2024, 5:17 p.m. UTC | #1
Unaddressed
On Fri, 12 Jul 2024 19:30:47 +0800
Jun Wang <junwang01@cestc.cn> wrote:

> The code to update link status is not safe in secondary process.
> If called from secondary it will crash, example from dumpcap:
>     eth_em_link_update
> 
> Signed-off-by: Jun Wang <junwang01@cestc.cn>

Wouldn't it be better to fix the code in e1000_check_link to work in
secondary process. There are network virtual appliances that use
secondary process for all processing.
  
Jun Wang July 14, 2024, 8:26 a.m. UTC | #2
>> The code to update link status is not safe in secondary process.
>> If called from secondary it will crash, example from dumpcap:
>>     eth_em_link_update
>>
>> Signed-off-by: Jun Wang <junwang01@cestc.cn>
> 
> Wouldn't it be better to fix the code in e1000_check_link to work in
> secondary process. There are network virtual appliances that use
> secondary process for all processing.

Yes, the e1000 virtual network card currently does not work properly 
in the secondary process. After skipping eth_em_link_update, I tested
the e1000 card and it was able to capture packets normally. For the 
secondary process, I think eth_em_link_update is not necessary.



Jun Wang
  
Bruce Richardson July 22, 2024, 5:08 p.m. UTC | #3
On Sun, Jul 14, 2024 at 04:26:26PM +0800, Jun Wang wrote:
>    >> The code to update link status is not safe in secondary process.
>    >> If called from secondary it will crash, example from dumpcap:
>    >>     eth_em_link_update
>    >>
>    >> Signed-off-by: Jun Wang <junwang01@cestc.cn>
>    >
>    > Wouldn't it be better to fix the code in e1000_check_link to work in
>    > secondary process. There are network virtual appliances that use
>    > secondary process for all processing.
> 
>    Yes, the e1000 virtual network card currently does not work properly
> 
>    in the secondary process. After skipping eth_em_link_update, I tested
> 
>    the e1000 card and it was able to capture packets normally. For the
> 
>    secondary process, I think eth_em_link_update is not necessary.
>    __________________________________________________________________
> 

Hi Jun,

can you provide some instructions as to how I can go about reproducing the
issue? I used a VM with emulated e1000 NICs on it and was able to run
testpmd and dumpcap side by side. Similarly, I was able to run two
instances of the symmetric_mp app side by size without seeing any crashes.
I'd just like to verify the issue and confirm this fixes a problem before
merging.

Thanks,
/Bruce
  
Jun Wang July 23, 2024, 2:07 a.m. UTC | #4
I used the e1000 NIC with OVS-DPDK and experienced a failure when using the 
/dpdk/app/dpdk-dumpcap command to capture packets. After making modifications, 
it worked fine.

/dpdk/app/dpdk-dumpcap -i 0000:00:04.0
File: /tmp/dpdk-dumpcap_0_0000:00:04.0_20240723020203.pcapng
Segmentation fault (core dumped)


ovs-vsctl list open .
_uuid               : 82223d04-0a60-44fe-83ac-398a01b0bca3
bridges             : [8f188622-c17d-4d21-ad30-22c30860f050, ca88255e-54e5-4b06-aaff-fdde55018f35]
cur_cfg             : 15
datapath_types      : [netdev, system]
datapaths           : {netdev=d801e891-7990-4f77-ae9a-c776ba4430a4, system=25e5a761-3160-4ea1-adab-b4cddd8309e4}
db_version          : "8.3.0"
dpdk_initialized    : true
dpdk_version        : "DPDK 23.11.0"
external_ids        : {hostname=cell1-xgw-dpdk, ovn-bridge-datapath-type=netdev, ovn-bridge-mappings="external:br-tun,share:br-tun,direct-connect:br-tun", ovn-encap-ip="172.16.0.13", ovn-encap-tos=inherit, ovn-encap-type=geneve, ovn-remote="tcp:[192.168.122.171]:6642", ovn-set-local-ip="true", rundir="/var/run/openvswitch", system-id=cell1-xgw-dpdk}
iface_types         : [bareudp, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : []
next_cfg            : 15
other_config        : {dpdk-extra=" -a 0000:00:04.0" , dpdk-init="true", pmd-perf-metrics="false", vlan-limit="0"}
ovs_version         : "2.17.5"
ssl                 : []
statistics          : {}
system_type         : cclinux
system_version      : "22.09.2"


status              : {bus_info="bus_name=pci, vendor_id=8086, device_id=100e", driver_name=net_e1000_em, if_descr="DPDK 23.11.0 net_e1000_em", if_type="6", link_speed="1Gbps", max_hash_mac_addrs="0", max_mac_addrs="15", max_rx_pktlen="1518", max_rx_queues="2", max_tx_queues="2", max_vfs="0", max_vmdq_pools="0", min_rx_bufsize="256", n_rxq="1", n_txq="2", numa_id="-1", port_no="0", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="false", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="false", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}


/usr/local/bin/dpdk-devbind.py --status

Network devices using DPDK-compatible driver
============================================
0000:00:04.0 '82540EM Gigabit Ethernet Controller 100e' drv=uio_pci_generic unused=

Network devices using kernel driver
===================================
0000:00:03.0 '82540EM Gigabit Ethernet Controller 100e' if=eth0 drv=e1000 unused=uio_pci_generic *Active*




Jun Wang
  
Bruce Richardson Aug. 22, 2024, 3:58 p.m. UTC | #5
On Fri, Jul 12, 2024 at 07:30:47PM +0800, Jun Wang wrote:
> The code to update link status is not safe in secondary process.
> If called from secondary it will crash, example from dumpcap:
>     eth_em_link_update
> 
> Signed-off-by: Jun Wang <junwang01@cestc.cn>
> ---
>  drivers/net/e1000/em_ethdev.c | 3 +++
>  1 file changed, 3 insertions(+)
>

Given this is fixing an issue experienced in the real-world I think we
should take this patch. As Stephen says, a better solution would be to have
the whole function work properly in secondary, but I'd rather avoid crashes
as a priority.

Fixes: 805803445a02 ("e1000: support EM devices (also known as e1000/e1000e)")
Cc: stable@dpdk.org

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
Bruce Richardson Aug. 22, 2024, 4:02 p.m. UTC | #6
On Thu, Aug 22, 2024 at 04:58:59PM +0100, Bruce Richardson wrote:
> On Fri, Jul 12, 2024 at 07:30:47PM +0800, Jun Wang wrote:
> > The code to update link status is not safe in secondary process.
> > If called from secondary it will crash, example from dumpcap:
> >     eth_em_link_update
> > 
> > Signed-off-by: Jun Wang <junwang01@cestc.cn>
> > ---
> >  drivers/net/e1000/em_ethdev.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> 
> Given this is fixing an issue experienced in the real-world I think we
> should take this patch. As Stephen says, a better solution would be to have
> the whole function work properly in secondary, but I'd rather avoid crashes
> as a priority.
> 
> Fixes: 805803445a02 ("e1000: support EM devices (also known as e1000/e1000e)")
> Cc: stable@dpdk.org
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com> 

Applied to dpdk-next-net-intel.

Thanks,
/Bruce
  

Patch

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index c5a4dec..f6875b0 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1136,6 +1136,9 @@  static int eth_em_pci_remove(struct rte_pci_device *pci_dev)
 	struct rte_eth_link link;
 	int link_up, count;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return -1;
+
 	link_up = 0;
 	hw->mac.get_link_status = 1;