[v5,2/2] mem: telemetry support for system memory information

Message ID 20220929114313.1346972-2-amitprakashs@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [v5,1/2] mem: telemetry support for memseg and element information |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS

Commit Message

Amit Prakash Shukla Sept. 29, 2022, 11:43 a.m. UTC
  Changes adds telemetry support to display system memory information,
allocated using calls malloc, calloc, mmap, etc. This patch
is based on malloc_info. This patch adds following endpoints:

1. /sysmem/sys_heap_list
The commands displays the arenas currently in use.
Example:
--> /sysmem/sys_heap_list
{"/sysmem/sys_heap_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}

2. /sysmem/sys_heap_info,<arena-id>
This command displays the information about arena.
Example:
--> /sysmem/sys_heap_info,0
{"/sysmem/sys_heap_info": {"Arena_id": 0, "Allocated_size": 2069934, \
 "Free_count": 4, "Free_size": 223826, "Curr_size": 2293760,         \
 "Mmap_count": 0, "Mmap_size": 0, "Heap_count": 0,                   \
 "Heap_size": 2293760}}
--> /sysmem/sys_heap_info,6
{"/sysmem/sys_heap_info": {"Arena_id": 6, "Allocated_size": 3136, \
 "Free_count": 2, "Free_size": 193472, "Curr_size": 196608,       \
 "Mmap_count": 0, "Mmap_size": 0, "Heap_count": 1,                \
 "Heap_size": 196608}}

The last arena-id in the list gives total of all arenas.

--> /sysmem/sys_heap_info,10
{"/sysmem/sys_heap_info": {"Arena_id": 10, "Allocated_size": 2107774, \
 "Free_count": 20, "Free_size": 1955458, "Curr_size": 4063232,        \
 "Mmap_count": 0, "Mmap_size": 0, "Heap_count": 0,                    \
 "Heap_size": 4063232}}

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
 lib/eal/common/eal_common_memory.c | 330 +++++++++++++++++++++++++++++
 1 file changed, 330 insertions(+)
  

Comments

David Marchand Oct. 7, 2022, 7:46 p.m. UTC | #1
On Thu, Sep 29, 2022 at 1:44 PM Amit Prakash Shukla
<amitprakashs@marvell.com> wrote:
>
> Changes adds telemetry support to display system memory information,
> allocated using calls malloc, calloc, mmap, etc. This patch
> is based on malloc_info. This patch adds following endpoints:

malloc_info is a GNU extension.
It is not available in musl and it breaks compilation in Alpine Linux.
So I can't take this patch.


>
> 1. /sysmem/sys_heap_list
> The commands displays the arenas currently in use.
> Example:
> --> /sysmem/sys_heap_list
> {"/sysmem/sys_heap_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
>

I am unsure about the command names.

Those commands are really low level and tied to glibc malloc.
It is unlikely we will have an unified naming with other libc/OS.

I would prefer another pair of eyes, especially on this patch.
Dmitry, Anatoly?
  
Amit Prakash Shukla Oct. 11, 2022, 7:10 a.m. UTC | #2
Thanks David for the feedback. Please find the proposed fixed inline.

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Saturday, October 8, 2022 1:17 AM
> To: Amit Prakash Shukla <amitprakashs@marvell.com>; Thomas Monjalon
> <thomas@monjalon.net>; Anatoly Burakov <anatoly.burakov@intel.com>;
> dmitry.kozliuk@gmail.com
> Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> bruce.richardson@intel.com; ciara.power@intel.com
> Subject: [EXT] Re: [PATCH v5 2/2] mem: telemetry support for system
> memory information
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, Sep 29, 2022 at 1:44 PM Amit Prakash Shukla
> <amitprakashs@marvell.com> wrote:
> >
> > Changes adds telemetry support to display system memory information,
> > allocated using calls malloc, calloc, mmap, etc. This patch is based
> > on malloc_info. This patch adds following endpoints:
> 
> malloc_info is a GNU extension.
> It is not available in musl and it breaks compilation in Alpine Linux.
> So I can't take this patch.

Shall we limit this for Glibc at compile time ?

+#if defined __GLIBC__ && defined __GLIBC_PREREQ
+#if __GLIBC_PREREQ(2, 10)
        /* Gets system memory stat's XML format. */
        ret = malloc_info(0, fp);
+#endif
+#else
+       RTE_LOG(DEBUG, EAL, "Error: malloc_info not supported by libc\n");
+       fclose(fp);
+       return -1;
+#endif

I will send out a v6 if the above solution is fine.

> 
> 
> >
> > 1. /sysmem/sys_heap_list
> > The commands displays the arenas currently in use.
> > Example:
> > --> /sysmem/sys_heap_list
> > {"/sysmem/sys_heap_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
> >
> 
> I am unsure about the command names.
> 
> Those commands are really low level and tied to glibc malloc.
> It is unlikely we will have an unified naming with other libc/OS.
> 
> I would prefer another pair of eyes, especially on this patch.
> Dmitry, Anatoly?
> 
> 
> --
> David Marchand
  
Dmitry Kozlyuk Oct. 20, 2022, 7:18 p.m. UTC | #3
2022-10-11 07:10 (UTC+0000), Amit Prakash Shukla:
> Thanks David for the feedback. Please find the proposed fixed inline.
> 
> > -----Original Message-----
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Saturday, October 8, 2022 1:17 AM
> > To: Amit Prakash Shukla <amitprakashs@marvell.com>; Thomas Monjalon
> > <thomas@monjalon.net>; Anatoly Burakov <anatoly.burakov@intel.com>;
> > dmitry.kozliuk@gmail.com
> > Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > bruce.richardson@intel.com; ciara.power@intel.com
> > Subject: [EXT] Re: [PATCH v5 2/2] mem: telemetry support for system
> > memory information
> > 
> > External Email
> > 
> > ----------------------------------------------------------------------
> > On Thu, Sep 29, 2022 at 1:44 PM Amit Prakash Shukla
> > <amitprakashs@marvell.com> wrote:  
> > >
> > > Changes adds telemetry support to display system memory information,
> > > allocated using calls malloc, calloc, mmap, etc. This patch is based
> > > on malloc_info. This patch adds following endpoints:  
> > 
> > malloc_info is a GNU extension.
> > It is not available in musl and it breaks compilation in Alpine Linux.
> > So I can't take this patch.  
> 
> Shall we limit this for Glibc at compile time ?
> 
> +#if defined __GLIBC__ && defined __GLIBC_PREREQ
> +#if __GLIBC_PREREQ(2, 10)
>         /* Gets system memory stat's XML format. */
>         ret = malloc_info(0, fp);
> +#endif
> +#else
> +       RTE_LOG(DEBUG, EAL, "Error: malloc_info not supported by libc\n");
> +       fclose(fp);
> +       return -1;
> +#endif
> 
> I will send out a v6 if the above solution is fine.
> 
> > 
> >   
> > >
> > > 1. /sysmem/sys_heap_list
> > > The commands displays the arenas currently in use.
> > > Example:  
> > > --> /sysmem/sys_heap_list  
> > > {"/sysmem/sys_heap_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
> > >  
> > 
> > I am unsure about the command names.
> > 
> > Those commands are really low level and tied to glibc malloc.
> > It is unlikely we will have an unified naming with other libc/OS.
> > 
> > I would prefer another pair of eyes, especially on this patch.
> > Dmitry, Anatoly?

I agree with David.

Microsoft CRT provides completely different statistics.
I can't say for FreeBSD.
Even if GNU libc is used, the process can be run with jemalloc, for example,
and this statistics become meaningless then.
What is "system" memory, anyway? It can be acquired bypassing malloc,
it can be hugepages mmap'd without DPDK, etc.

What is the task this API must solve?
My guess is that monitoring needs to know how much physical memory
1) the process uses and 2) how much it can use, so on Linux this API
could be implemented using getrusage() and getrlimit().
I don't think that a fully portable implementation is possible,
but at least API will be unified.
  
Stephen Hemminger Oct. 20, 2022, 7:50 p.m. UTC | #4
On Thu, 20 Oct 2022 22:18:27 +0300
Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:

> > 
> > I will send out a v6 if the above solution is fine.
> >   
> > > 
> > >     
> > > >
> > > > 1. /sysmem/sys_heap_list
> > > > The commands displays the arenas currently in use.
> > > > Example:    
> > > > --> /sysmem/sys_heap_list    
> > > > {"/sysmem/sys_heap_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
> > > >    
> > > 
> > > I am unsure about the command names.
> > > 
> > > Those commands are really low level and tied to glibc malloc.
> > > It is unlikely we will have an unified naming with other libc/OS.
> > > 
> > > I would prefer another pair of eyes, especially on this patch.
> > > Dmitry, Anatoly?  
> 
> I agree with David.
> 
> Microsoft CRT provides completely different statistics.
> I can't say for FreeBSD.
> Even if GNU libc is used, the process can be run with jemalloc, for example,
> and this statistics become meaningless then.
> What is "system" memory, anyway? It can be acquired bypassing malloc,
> it can be hugepages mmap'd without DPDK, etc.
> 
> What is the task this API must solve?
> My guess is that monitoring needs to know how much physical memory
> 1) the process uses and 2) how much it can use, so on Linux this API
> could be implemented using getrusage() and getrlimit().
> I don't think that a fully portable implementation is possible,
> but at least API will be unified.


NAK
You are opening a mess here.
The malloc_info() exporting XML also means you have to parse the crap.

Telemetry to should stick to the DPDK environment and not try to support
looking at other parts of the system. Otherwise, it risks becoming
tied to one OS and version.
  

Patch

diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c
index 6b863979e9..b5326119f0 100644
--- a/lib/eal/common/eal_common_memory.c
+++ b/lib/eal/common/eal_common_memory.c
@@ -9,6 +9,9 @@ 
 #include <stdlib.h>
 #include <string.h>
 #include <inttypes.h>
+#ifdef RTE_EXEC_ENV_LINUX
+#include <malloc.h>
+#endif
 
 #include <rte_fbarray.h>
 #include <rte_memory.h>
@@ -1124,6 +1127,12 @@  rte_eal_memory_init(void)
 #define EAL_MEMSEG_INFO_REQ		"/eal/memseg_info"
 #define EAL_ELEMENT_LIST_REQ		"/eal/element_list"
 #define EAL_ELEMENT_INFO_REQ		"/eal/element_info"
+
+#ifdef RTE_EXEC_ENV_LINUX
+#define SYSMEMORY_LIST_REQ		"/sysmem/sys_heap_list"
+#define SYSMEMORY_INFO_REQ		"/sysmem/sys_heap_info"
+#endif
+
 #define ADDR_STR			15
 
 
@@ -1685,6 +1694,318 @@  handle_eal_element_info_request(const char *cmd __rte_unused,
 	return 0;
 }
 
+#ifdef RTE_EXEC_ENV_LINUX
+#define MAX_SYS_MEM_ARENAS	128
+#define MAX_TAG_CHAR		128
+
+/* Memory size are in bytes. */
+struct mem_stats {
+	uint64_t fast_count; /* Number of free blocks in fast bin. */
+	uint64_t fast_size;  /* Size in bytes of free blocks in fast bin. */
+	uint64_t rest_count; /* Number of free blocks in bin. */
+	uint64_t rest_size;  /* Size in bytes of free blocks in bin. */
+	uint64_t mmap_count; /* Number of mmap blocks. */
+	uint64_t mmap_size;  /* Size in bytes of mmap'd memory. */
+	uint64_t curr_size;  /* Size in bytes allocated by system. */
+	uint64_t heap_size;  /* Heap size in bytes. */
+	uint64_t heap_count; /* Number of heaps. */
+};
+
+struct rte_heap_mem_stats {
+	unsigned int num_active_arena;
+	struct mem_stats stats[MAX_SYS_MEM_ARENAS];
+};
+
+/* This function shall be called to parse only attributes.
+ * Parsing of the "tags" shall be done by the caller.
+ */
+static int
+parse_attr(char *buf, uint32_t *i, char *attr, const char *key)
+{
+	int j = 0;
+	int keymatch = 0;
+
+	attr[j] = '\0';
+
+	while ((buf[*i] != '>') && (j < MAX_TAG_CHAR)) {
+		/* Ignore spaces. */
+		if (buf[*i] == ' ') {
+			attr[j] = '\0';
+			j = 0;
+			(*i)++;
+			continue;
+		}
+
+		/* Attribute key */
+		if (buf[*i] == '=') {
+			attr[j] = '\0';
+			j = 0;
+			(*i)++;
+
+			/* If the key is matched, extract the value. */
+			if (strncmp(attr, key, strlen(key)) != 0)
+				continue;
+			else
+				keymatch = 1;
+		}
+
+		/* Attribute value */
+		if ((buf[*i] == '"') && (keymatch == 1)) {
+			j = 0;
+			(*i)++;
+
+			while ((buf[*i] != '"') && (j < MAX_TAG_CHAR))
+				attr[(j)++] = buf[(*i)++];
+			attr[j] = '\0';
+			(*i)++;
+			return 0;
+		}
+
+		keymatch = 0;
+		attr[(j)++] = buf[(*i)++];
+	}
+
+	(*i)++;
+	return -1;
+}
+
+/* Get the system memory stats into buffer by calling malloc_info().
+ * malloc_info() returns the stats in XML format. Parse the XML to extract
+ * number of heaps, size of each heap, free memory in heap.
+ */
+static int
+parse_heap_mem_stats(struct rte_heap_mem_stats *heap_stats)
+{
+	char tag[MAX_TAG_CHAR] = {0};
+	int old_mem_index = -1;
+	int mem_index = -1;
+	uint32_t i = 0;
+	uint32_t j = 0;
+	size_t length;
+	char *buf;
+	FILE *fp;
+	int ret;
+
+	/* buf is dynamically allocated by open_memstream. */
+	fp = open_memstream(&buf, &length);
+	if (fp == NULL) {
+		RTE_LOG(DEBUG, EAL, "Error: Failed to open memory stream\n");
+		return -1;
+	}
+
+	/* Gets system memory stat's XML format. */
+	ret = malloc_info(0, fp);
+	fclose(fp);
+
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Error: malloc_info returned error\n");
+		return -1;
+	}
+
+	while (i < length) {
+		j = 0;
+		tag[j] = '\0';
+
+		/* Ignore newline and spaces. */
+		if ((buf[i] == '\n') || (buf[i] == ' ') || (buf[i] == '/') ||
+		    (buf[i] == '>')) {
+			i++;
+			continue;
+		}
+
+		if (buf[i] == '<') {
+			i++;
+			while ((buf[i] != ' ') && (buf[i] != '>') &&
+			       (j < MAX_TAG_CHAR)) {
+				tag[j++] = buf[i++];
+			}
+
+			if (strncmp(tag, "heap", strlen("heap")) == 0) {
+				old_mem_index = mem_index++;
+				if (mem_index >= MAX_SYS_MEM_ARENAS) {
+					RTE_LOG(DEBUG, EAL, "Memory arena "
+						"exceeded max limit: %d",
+						MAX_SYS_MEM_ARENAS);
+					goto done;
+				}
+				heap_stats->num_active_arena++;
+			}
+
+			continue;
+		}
+
+		if (mem_index < 0) {
+			i++;
+			continue;
+		}
+
+		if (parse_attr(buf, &i, tag, "type") < 0)
+			continue;
+
+		if (strncmp(tag, "fast", strlen("fast")) == 0) {
+			/* For total of all arenas, "heap" tag is not present
+			 * in xml. Below check is to handle that scenarios.
+			 *
+			 * FIXME: mem_index increment shall be independent of
+			 * the tag.
+			 */
+			if (old_mem_index == mem_index) {
+				mem_index++;
+				if (mem_index >= MAX_SYS_MEM_ARENAS) {
+					RTE_LOG(DEBUG, EAL, "Memory arena "
+						"exceeded max limit: %d\n",
+						MAX_SYS_MEM_ARENAS);
+					goto done;
+				}
+				heap_stats->num_active_arena++;
+			}
+			old_mem_index = mem_index;
+
+			if (parse_attr(buf, &i, tag, "count") == 0)
+				heap_stats->stats[mem_index].fast_count =
+							strtoul(tag, NULL, 10);
+			if (parse_attr(buf, &i, tag, "size") == 0)
+				heap_stats->stats[mem_index].fast_size =
+							strtoul(tag, NULL, 10);
+			continue;
+		}
+
+		if (strncmp(tag, "rest", strlen("rest")) == 0) {
+			if (parse_attr(buf, &i, tag, "count") == 0)
+				heap_stats->stats[mem_index].rest_count =
+							strtoul(tag, NULL, 10);
+			if (parse_attr(buf, &i, tag, "size") == 0)
+				heap_stats->stats[mem_index].rest_size =
+							strtoul(tag, NULL, 10);
+			continue;
+		}
+
+		if (strncmp(tag, "current", strlen("current")) == 0) {
+			if (parse_attr(buf, &i, tag, "size") == 0)
+				heap_stats->stats[mem_index].curr_size =
+							strtoul(tag, NULL, 10);
+			continue;
+		}
+
+		if (strncmp(tag, "total", strlen("total")) == 0) {
+			if (parse_attr(buf, &i, tag, "size") == 0)
+				heap_stats->stats[mem_index].heap_size =
+							strtoul(tag, NULL, 10);
+			continue;
+		}
+
+		if (strncmp(tag, "subheaps", strlen("subheaps")) == 0) {
+			if (parse_attr(buf, &i, tag, "size") == 0)
+				heap_stats->stats[mem_index].heap_count =
+							strtoul(tag, NULL, 10);
+			continue;
+		}
+
+		if (strncmp(tag, "mmap", strlen("mmap")) == 0) {
+			if (parse_attr(buf, &i, tag, "count") == 0)
+				heap_stats->stats[mem_index].mmap_count =
+							strtoul(tag, NULL, 10);
+			if (parse_attr(buf, &i, tag, "size") == 0)
+				heap_stats->stats[mem_index].mmap_size =
+							strtoul(tag, NULL, 10);
+			continue;
+		}
+
+		i++;
+	}
+
+done:
+	/* All done! Let's free the buf. */
+	free(buf);
+	return 0;
+}
+
+static int
+handle_sysmem_list_request(const char *cmd __rte_unused,
+			   const char *params __rte_unused,
+			   struct rte_tel_data *d)
+{
+	struct rte_heap_mem_stats heap_mem_stats;
+	unsigned int num_arena;
+	unsigned int i;
+
+	memset(&heap_mem_stats, 0, sizeof(struct rte_heap_mem_stats));
+	if (parse_heap_mem_stats(&heap_mem_stats) != 0)
+		return -1;
+
+	/* Note:
+	 * Total active arenas are (num_active_arena - 1). The last entry in
+	 * the array is total of all arenas.
+	 */
+	num_arena = heap_mem_stats.num_active_arena;
+
+	rte_tel_data_start_array(d, RTE_TEL_INT_VAL);
+	for (i = 0; i < num_arena; i++)
+		rte_tel_data_add_array_int(d, i);
+
+	return 0;
+}
+
+static int
+handle_sysmem_info_request(const char *cmd __rte_unused, const char *params,
+			   struct rte_tel_data *d)
+{
+	struct rte_heap_mem_stats heap_mem_stats;
+	unsigned int arena_id;
+	uint64_t free_size;
+	uint64_t free_count;
+	uint64_t allocated_size;
+
+	if (params == NULL || strlen(params) == 0 || !isdigit(*params))
+		return -1;
+
+	arena_id = (unsigned int)strtoul(params, NULL, 10);
+	if (arena_id > UINT32_MAX)
+		return -1;
+
+	if (arena_id >= MAX_SYS_MEM_ARENAS)
+		return -1;
+
+	memset(&heap_mem_stats, 0, sizeof(struct rte_heap_mem_stats));
+	if (parse_heap_mem_stats(&heap_mem_stats) != 0)
+		return -1;
+
+	if (arena_id >= heap_mem_stats.num_active_arena) {
+		RTE_LOG(DEBUG, EAL, "Memory arena exceeded max limit: %d\n",
+			MAX_SYS_MEM_ARENAS);
+		return -1;
+	}
+
+	/* Fast and rest account for the total free memory. */
+	free_size = heap_mem_stats.stats[arena_id].fast_size +
+		    heap_mem_stats.stats[arena_id].rest_size;
+
+	free_count = heap_mem_stats.stats[arena_id].fast_count +
+		     heap_mem_stats.stats[arena_id].rest_count;
+
+	/* (System memory - free size) = allocated memory size. */
+	allocated_size = heap_mem_stats.stats[arena_id].curr_size - free_size;
+
+	rte_tel_data_start_dict(d);
+	rte_tel_data_add_dict_int(d, "Arena_id", arena_id);
+	rte_tel_data_add_dict_int(d, "Allocated_size", allocated_size);
+	rte_tel_data_add_dict_u64(d, "Free_count", free_count);
+	rte_tel_data_add_dict_u64(d, "Free_size", free_size);
+	rte_tel_data_add_dict_u64(d, "Curr_size",
+				  heap_mem_stats.stats[arena_id].curr_size);
+	rte_tel_data_add_dict_u64(d, "Mmap_count",
+				  heap_mem_stats.stats[arena_id].mmap_count);
+	rte_tel_data_add_dict_u64(d, "Mmap_size",
+				  heap_mem_stats.stats[arena_id].mmap_size);
+	rte_tel_data_add_dict_u64(d, "Heap_count",
+				  heap_mem_stats.stats[arena_id].heap_count);
+	rte_tel_data_add_dict_u64(d, "Heap_size",
+				  heap_mem_stats.stats[arena_id].heap_size);
+
+	return 0;
+}
+#endif
+
 RTE_INIT(memory_telemetry)
 {
 	rte_telemetry_register_cmd(
@@ -1716,5 +2037,14 @@  RTE_INIT(memory_telemetry)
 	rte_telemetry_register_cmd(EAL_ELEMENT_INFO_REQ,
 			handle_eal_element_info_request,
 			"Returns element info. Parameters: int heap_id, memseg_list_id, memseg_id, start_elem_id, end_elem_id");
+
+#ifdef RTE_EXEC_ENV_LINUX
+	rte_telemetry_register_cmd(SYSMEMORY_LIST_REQ,
+			handle_sysmem_list_request,
+			"Returns element information. Takes no parameters");
+	rte_telemetry_register_cmd(SYSMEMORY_INFO_REQ,
+			handle_sysmem_info_request,
+			"Returns element information. Parameters: int arena_id");
+#endif
 }
 #endif