Average relation size of Apache AGE

I recognize that TOAST can store the data in an efficient way. And I tested some large data, 8.9 billion records and 20TB data in PostgreSQL in the past. The following is the article I wrote about the test.

PostgreSQLに20TB・89億件のデータを入れてみる

But I also recognize that I need to check the actual size of the data in the graph database.

Test Scenario

After loading the data, I checked the size of the graph data with the following query in Appendix .

RelationSize# of recordsavg size of records
Actor1040384048115216.2286189
Film57163776222280257.1701278
ACTED_IN123527168767492160.9491278

And each node and edge has the following properties.

Actor: 3 properties
Film: 6 properties
ACTED_IN: 3 properties

If a graph has the similar number of properties, the size of the items in the graph will be similar to the following:
Node: 237 Bytes
Edge: 161 Bytes

Estimation

Total number of nodes: 0.2 billion
Total relation size estimated: 200,000,000 * 237 / 1024(KB) / 1024(MB) / 1024(GB) = 44.1GB

Total number of edges: 0.4 billion
Total relation size estimated: 400,000,000 * 161 / 1024(KB) / 1024(MB) / 1024(GB) = 60.0GB

To be frank, the average size of nodes / edges depends on the number of properties and the complexity of them such as number, text.

Appendix

Sample code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import asyncio
import os
from agefreighter import Factory


async def main():
    class_name = "CSVFreighter"
    instance = Factory.create_instance(class_name)

    await instance.connect(
        dsn=os.environ["PG_CONNECTION_STRING"],
        max_connections=64,
    )
    await instance.load(
        graph_name="AgeTester",
        start_v_label="Actor",
        start_id="ActorID",
        start_props=["Actor"],
        edge_type="ACTED_IN",
        end_v_label="Film",
        end_id="FilmID",
        end_props=["Film", "Year", "Votes", "Rating"],
        csv="./actorfilms.csv",
        drop_graph=True,
    )


if __name__ == "__main__":
    asyncio.run(main())

Actual relation size

postgres=> SELECT pg_total_relation_size('"AgeTester"."Actor"') AS size;
   size
----------
 10403840
(1 row)

postgres=> select count(*) from "AgeTester"."Actor";
 count
-------
 48115
(1 row)

postgres=> SELECT pg_total_relation_size('"AgeTester"."Film"') AS size;
   size
----------
 57163776
(1 row)

Actual number of records

postgres=> select count(*) from "AgeTester"."Film";
 count
--------
 222280
(1 row)

postgres=> SELECT pg_total_relation_size('"AgeTester"."ACTED_IN"') AS size;
   size
-----------
 123527168
(1 row)

postgres=> select count(*) from "AgeTester"."ACTED_IN";
 count
--------
 767492
(1 row)

Relation Schema

postgres=> \d+ "AgeTester"."Actor";
                                                                                               Table "AgeTester.Actor"
   Column   |  Type   | Collation | Nullable |                                                     Default                                                     | Storage  | Compression | Stats target | Description
------------+---------+-----------+----------+-----------------------------------------------------------------------------------------------------------------+----------+-------------+--------------+-------------
 id         | graphid |           | not null | _graphid(_label_id('AgeTester'::name, 'Actor'::name)::integer, nextval('"AgeTester"."Actor_id_seq"'::regclass)) | plain    |             |              |
 properties | agtype  |           | not null | agtype_build_map()                                                                                              | extended |             |              |
Indexes:
    "Actor_id_idx" btree (id)
    "Actor_properties_idx" gin (properties)
Inherits: "AgeTester"._ag_label_vertex
Access method: heap

postgres=> \d+ "AgeTester"."Film";
                                                                                              Table "AgeTester.Film"
   Column   |  Type   | Collation | Nullable |                                                    Default                                                    | Storage  | Compression | Stats target | Description
------------+---------+-----------+----------+---------------------------------------------------------------------------------------------------------------+----------+-------------+--------------+-------------
 id         | graphid |           | not null | _graphid(_label_id('AgeTester'::name, 'Film'::name)::integer, nextval('"AgeTester"."Film_id_seq"'::regclass)) | plain    |             |              |
 properties | agtype  |           | not null | agtype_build_map()                                                                                            | extended |             |              |
Indexes:
    "Film_id_idx" btree (id)
    "Film_properties_idx" gin (properties)
Inherits: "AgeTester"._ag_label_vertex
Access method: heap

postgres=> \d+ "AgeTester"."ACTED_IN";
                                                                                                Table "AgeTester.ACTED_IN"
   Column   |  Type   | Collation | Nullable |                                                        Default                                                        | Storage  | Compression | Stats target | Description
------------+---------+-----------+----------+-----------------------------------------------------------------------------------------------------------------------+----------+-------------+--------------+-------------
 id         | graphid |           | not null | _graphid(_label_id('AgeTester'::name, 'ACTED_IN'::name)::integer, nextval('"AgeTester"."ACTED_IN_id_seq"'::regclass)) | plain    |             |              |
 start_id   | graphid |           | not null |                                                                                                                       | plain    |             |              |
 end_id     | graphid |           | not null |                                                                                                                       | plain    |             |              |
 properties | agtype  |           | not null | agtype_build_map()                                                                                                    | extended |             |              |
Indexes:
    "ACTED_IN_end_id_idx" btree (end_id)
    "ACTED_IN_start_id_idx" btree (start_id)
Inherits: "AgeTester"._ag_label_edge
Access method: heap

Sample Data

postgres=> select * from "AgeTester"."Actor" limit 1;
       id        |                            properties
-----------------+------------------------------------------------------------------
 844424930131969 | {"id": "nm0000001", "Actor": "Fred Astaire", "ActorSerial": "1"}
(1 row)

postgres=> select * from "AgeTester"."Film" limit 1;
        id        |                                                   properties
------------------+-----------------------------------------------------------------------------------------------------------------
 1125899906842625 | {"id": "tt0082449", "Film": "Ghost Story", "Year": "1981", "Votes": "7731", "Rating": "6.3", "FilmSerial": "1"}
(1 row)

postgres=> select * from "AgeTester"."ACTED_IN" limit 1;
        id        |    start_id     |      end_id      |                            properties
------------------+-----------------+------------------+-------------------------------------------------------------------
 1407374883553281 | 844424930131969 | 1125899906842625 | {"Genre": "Commedy", "Director": "Joe Dummy", "ActedSerial": "1"}
(1 row)
タイトルとURLをコピーしました