[#22] Get LSN #23

GuChad369 · 2024-09-03T23:10:30Z

I have added the new function pgmoneta_ext_get_lsn() and tested it on PostgreSQL 15.

[postgres@fedora-linux-38 pgsql]$ psql -h localhost -p 5432 -U repl postgres 
Password for user repl: 
psql (15.1)
Type "help" for help.

postgres=# SELECT pgmoneta_ext_get_lsn();
 pgmoneta_ext_get_lsn 
----------------------
 0/8D072BF0
(1 row)

postgres=# SELECT pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/8D072BF0
(1 row)

The result is the same when compared to the PostgreSQL function, so I believe it's working correctly.

The next step is to create a new test suite to test this in the container environment.

jesperpedersen · 2024-09-04T02:08:47Z

Yeah, looks correct.

You can go through https://www.crunchydata.com/blog/postgres-wal-files-and-sequuence-numbers to get more ideas

GuChad369 · 2024-09-04T23:30:05Z

I have updated the test suite and tested it. This function works on all five versions.

Jubilee101 · 2024-09-09T08:06:25Z

Sorry about the delay.. Not sure if this is what we want. Think about why we need that start lsn, we may need to redo everything that's not written to disk since backup starts when restoring that backup. And you are only getting the lastest lsn, which doesn't necessarilly mean that everything earlier than this has been written to disk.

Take a look at do_pg_backup_start() in postgres and see if we can recreate it... or borrow it directly. It should be easy if we can just borrow it. It's the privilege issue that worries me, since it involves triggering a check point, or at least wait for a regular check point to happen.

jesperpedersen · 2024-09-09T21:18:23Z

Try and look at pgbackrest too how they use the function.

Note,

src/db/db.c:        strZ(pgWalName(pgVersion)), pgVersion >= PG_VERSION_15 ? "backup_start" : "start_backup");

GuChad369 · 2024-09-10T16:58:30Z

And you are only getting the lastest lsn, which doesn't necessarilly mean that everything earlier than this has been written to disk.
Is it possible to first force a checkpoint to ensure everything is written to disk? We have created the function pgmoneta_ext_checkpoint to achieve this.

Take a look at do_pg_backup_start() in postgres and see if we can recreate it... or borrow it directly.
What is the purpose of recreating this function? Could you clarify the next steps? Thank you so much!

Jubilee101 · 2024-09-11T04:56:51Z

Is it possible to first force a checkpoint to ensure everything is written to disk? We have created the function pgmoneta_ext_checkpoint to achieve this.

If you take a look into pg_backup_start() that's essentially what the function does -- forcing or waiting for a checkpoint and get the redo point afterwards. What's bugging me is that normally "only superusers or users with the privileges of the pg_checkpoint role can call CHECKPOINT". So I'm not entirely sure if our user with "replication" role is allowed to do that. You need to investigate a bit on that.

What is the purpose of recreating this function? Could you clarify the next steps? Thank you so much!

You can either recreate the function or simply just borrow it in the extension API, whichever suits you. But either way make sure you understand the function internals. The purpose of the function is to get a start LSN for the backup. You see you don't stop the world while making a backup. But copying the entire data dir takes time, which means some transactions can come along during that time doing some changes on both the files you've already copied and the ones you haven't. So there's an internal inconsistency, some files reflect the changes of that transaction, while others don't.

But this is fine because we have WAL and the start LSN. So when restoring a backup, postgres always looks up the start LSN and starting from there, redo everything according to the log records up until the end LSN. So that the internal inconsistency is eliminated. That's why we keep saying that getting start/end LSN is crucial for making the backup. We are also interested in how pgbackrest does it, since unlike us it doesn't have access to postgres' internal functions.

jesperpedersen · 2024-09-11T13:10:05Z

The basic idea is basically the same as how PostgreSQL 17 does it.

Upload the latest manifest file to the extension
pg_backup_start()
pg_backup_stop()
Compare the 2 manifest files
Send the changes back to the client

You can use all the functions in https://www.postgresql.org/docs/13/functions-admin.html#FUNCTIONS-ADMIN-BACKUP to help you so an idea to be to go through all those functions.

One of the key points of this is that in order to force a CHECKPOINT the replication user need to have either SUPERUSER or pg_checkpoint in its roles. This need to be documented with pros and cons.

So, if the replication user doesn't have those we will need to wait for a checkpoint to happen - just like pgbackrest. Therefore there are two "modes" of this functionality.

Start by just doing a full backup - e.g. without the manifest files - using only the extention plus the checkpoint functionality in one of its forms

GuChad369 · 2024-09-11T21:25:08Z

So I'm not entirely sure if our user with "replication" role is allowed to do that.

Regarding this part, the extension does not impose any restrictions, allowing us to call internal functions without requiring superuser privileges. I tested this when creating the pgmoneta_ext_checkpoint function. Although the function usually requires the superuser or pg_checkpoint role to execute, our source code performs its own privilege checks. After removing all such checks and using our replication role to call the function, it still succeeded. This indicates that we can perform any action within the extension, and if we do not explicitly restrict privileges, there will be no limitations on what roles can execute the function. We need to handle the privilege checks in extension.

I have reviewed the code for the function and developed a basic understanding of it. I will continue exploring it further, but at the moment, everything still feels a bit broad.

Could you help me with a specific task— e.g. finding a way to send the manifest to the PostgreSQL server? This approach works better for me. I'm good at solving specific problems, but when it comes to large projects, I sometimes struggle to figure out where to start.

jesperpedersen · 2024-09-12T02:41:42Z

Start by calling pg_backup_start() and pg_backup_stop() and put the data into a struct.

Remember that older versions they were called pg_start_backup() / pg_stop_backup().

Maybe create GUCs that will allow a socket server to receive data - the manifest - and look into creating a protocol; maybe look at how https://www.postgresql.org/docs/17/protocol-replication.html does it

GuChad369 · 2024-09-12T21:29:14Z

I will first try to create two functions, pgmoneta_ext_start_backup and pgmoneta_ext_stop_backup so that we can utilize them on our core side.

Based on our previous discussions and my understanding of the pg_backup_start() and pg_backup_stop() processes, here’s a struct definition:

typedef struct BackupInfo
{
    char *backup_label;
    XLogRecPtr start_lsn;
    XLogRecPtr stop_lsn;
} BackupInfo;

If there is any additional information we need, please let me know.

Datum
pgmoneta_ext_start_backup(PG_FUNCTION_ARGS)
{
    // Declare a structure to store backup-related information
    BackupInfo *info; 

    // Calling the existing pg_backup_start function directly based on PostgreSQL version
    Datum result = DirectFunctionCall1(pg_backup_start, CStringGetDatum("backup_label"));

    // Populate the info structure with necessary backup details
    ......

    // Return the backup information
    PG_RETURN_LSN(info);
}

The above is my pseudocode for the function. Please feel free to provide any advice if it doesn't serve the intended purpose.

jesperpedersen · 2024-09-12T21:56:02Z

Sounds like you are heading in the right direction.

You can just make them static though, and just expose a single function pgmoneta_ext_full_backup() that uses those static functions. The key is to get the .tar generated, and sent back over the wire - I think the core replication protocol is a good way to do it

jesperpedersen requested review from jesperpedersen and Jubilee101 September 4, 2024 02:18

jesperpedersen assigned GuChad369 Sep 4, 2024

jesperpedersen added the feature New feature label Sep 4, 2024

[pgmoneta#22] Get LSN

90ae42e

GuChad369 force-pushed the lsn branch from 95fd251 to 90ae42e Compare September 4, 2024 23:24

Jubilee101 mentioned this pull request Sep 27, 2024

[#22] Initial pgmoneta_ext_full_backup() #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#22] Get LSN #23

[#22] Get LSN #23

GuChad369 commented Sep 3, 2024

jesperpedersen commented Sep 4, 2024

GuChad369 commented Sep 4, 2024

Jubilee101 commented Sep 9, 2024

jesperpedersen commented Sep 9, 2024

GuChad369 commented Sep 10, 2024

Jubilee101 commented Sep 11, 2024 •

edited

Loading

jesperpedersen commented Sep 11, 2024

GuChad369 commented Sep 11, 2024 •

edited

Loading

jesperpedersen commented Sep 12, 2024

GuChad369 commented Sep 12, 2024

jesperpedersen commented Sep 12, 2024

[#22] Get LSN #23

Are you sure you want to change the base?

[#22] Get LSN #23

Conversation

GuChad369 commented Sep 3, 2024

jesperpedersen commented Sep 4, 2024

GuChad369 commented Sep 4, 2024

Jubilee101 commented Sep 9, 2024

jesperpedersen commented Sep 9, 2024

GuChad369 commented Sep 10, 2024

Jubilee101 commented Sep 11, 2024 • edited Loading

jesperpedersen commented Sep 11, 2024

GuChad369 commented Sep 11, 2024 • edited Loading

jesperpedersen commented Sep 12, 2024

GuChad369 commented Sep 12, 2024

jesperpedersen commented Sep 12, 2024

Jubilee101 commented Sep 11, 2024 •

edited

Loading

GuChad369 commented Sep 11, 2024 •

edited

Loading