Myon's Debian Blog

This feed contains pages in the "debian" category.

Debian is running a "vcswatch" service that keeps track of the status of all packaging repositories that have a Vcs-Git (and other VCSes) header set and shows which repos might need a package upload to push pending changes out.

Naturally, this is a lot of data and the scratch partition on qa.debian.org had to be expanded several times, up to 300 GB in the last iteration. Attempts to reduce that size using shallow clones (git clone --depth=50) did not result more than a few percent of space saved. Running git gc on all repos helps a bit, but is tedious and as Debian is growing, the repos are still growing both in size and number. I ended up blocking all repos with checkouts larger than a gigabyte, and still the only cure was expanding the disk, or to lower the blocking threshold.

Since we only need a tiny bit of info from the repositories, namely the content of debian/changelog and a few other files from debian/, plus the number of commits since the last tag on the packaging branch, it made sense to try to get the info without fetching a full repo clone. The question if we could grab that solely using the GitLab API at salsa.debian.org was never really answered. But then, in #1032623, Gábor Németh suggested the use of git clone --filter blob:none. As things go, this sat unattended in the bug report for almost a year until the next "disk full" event made me give it a try.

The blob:none filter makes git clone omit all files, fetching only commit and tree information. Any blob (file content) needed at git run time is transparently fetched from the upstream repository, and stored locally. It turned out to be a game-changer. The (largish) repositories I tried it on shrank to 1/100 of the original size.

Poking around I figured we could even do better by using tree:0 as filter. This additionally omits all trees from the git clone, again only fetching the information at run time when needed. Some of the larger repos I tried it on shrank to 1/1000 of their original size.

I deployed the new option on qa.debian.org and scheduled all repositories to fetch a new clone on the next scan:

The initial dip from 100% to 95% is my first "what happens if we block repos > 500 MB" attempt. Over the week after that, the git filter clones reduce the overall disk consumption from almost 300 GB to 15 GB, a 1/20. Some repos shrank from GBs to below a MB.

Perhaps I should make all my git clones use one of the filters.

Posted Mon Mar 18 13:45:40 2024 Tags: debian

Back in 2015, when PostgreSQL 9.5 alpha 1 was released, I had posted the PostgreSQL data from Debian's popularity contest.

8 years and 8 PostgreSQL releases later, the graph now looks like this:

Currently, the most popular PostgreSQL on Debian systems is still PostgreSQL 13 (shipped in Bullseye), followed by PostgreSQL 11 (Buster). At the time of writing, PostgreSQL 9.6 (Stretch) and PostgreSQL 15 (Bookworm) share the third place, with 15 rising quickly.

Posted Sat Aug 26 23:49:40 2023 Tags: debian postgresql

pg_dirtyread

Earlier this week, I updated pg_dirtyread to work with PostgreSQL 14. pg_dirtyread is a PostgreSQL extension that allows reading "dead" rows from tables, i.e. rows that have already been deleted, or updated. Of course that works only if the table has not been cleaned-up yet by a VACUUM command or autovacuum, which is PostgreSQL's garbage collection machinery.

Here's an example of pg_dirtyread in action:

# create table foo (id int, t text);
CREATE TABLE
# insert into foo values (1, 'Doc1');
INSERT 0 1
# insert into foo values (2, 'Doc2');
INSERT 0 1
# insert into foo values (3, 'Doc3');
INSERT 0 1

# select * from foo;
 id │  t
────┼──────
  1 │ Doc1
  2 │ Doc2
  3 │ Doc3
(3 rows)

# delete from foo where id < 3;
DELETE 2

# select * from foo;
 id │  t
────┼──────
  3 │ Doc3
(1 row)

Oops! The first two documents have disappeared.

Now let's use pg_dirtyread to look at the table:

# create extension pg_dirtyread;
CREATE EXTENSION

# select * from pg_dirtyread('foo') t(id int, t text);
 id │  t
────┼──────
  1 │ Doc1
  2 │ Doc2
  3 │ Doc3

All three documents are still there, but only one of them is visible.

pg_dirtyread can also show PostgreSQL's system colums with the row location and visibility information. For the first two documents, xmax is set, which means the row has been deleted:

# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid  │ xmin │ xmax │ id │  t
───────┼──────┼──────┼────┼──────
 (0,1) │ 1577 │ 1580 │  1 │ Doc1
 (0,2) │ 1578 │ 1580 │  2 │ Doc2
 (0,3) │ 1579 │    0 │  3 │ Doc3
(3 rows)

Undelete

Caveat: I'm not promising any of the ideas quoted below will actually work in practice. There are a few caveats and a good portion of intricate knowledge about the PostgreSQL internals might be required to succeed properly. Consider consulting your favorite PostgreSQL support channel for advice if you need to recover data on any production system. Don't try this at work.

I always had plans to extend pg_dirtyread to include some "undelete" command to make deleted rows reappear, but never got around to trying that. But rows can already be restored by using the output of pg_dirtyread itself:

# insert into foo select * from pg_dirtyread('foo') t(id int, t text) where id = 1;

This is not a true "undelete", though - it just inserts new rows from the data read from the table.

pg_surgery

Enter pg_surgery, which is a new PostgreSQL extension supplied with PostgreSQL 14. It contains two functions to "perform surgery on a damaged relation". As a side-effect, they can also make delete tuples reappear.

As I discovered now, one of the functions, heap_force_freeze(), works nicely with pg_dirtyread. It takes a list of ctids (row locations) that it marks "frozen", but at the same time as "not deleted".

Let's apply it to our test table, using the ctids that pg_dirtyread can read:

# create extension pg_surgery;
CREATE EXTENSION

# select heap_force_freeze('foo', array_agg(ctid))
    from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text) where id = 1;
 heap_force_freeze
───────────────────

(1 row)

Et voilà, our deleted document is back:

# select * from foo;
 id │  t
────┼──────
  1 │ Doc1
  3 │ Doc3
(2 rows)

# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid  │ xmin │ xmax │ id │  t
───────┼──────┼──────┼────┼──────
 (0,1) │    2 │    0 │  1 │ Doc1
 (0,2) │ 1578 │ 1580 │  2 │ Doc2
 (0,3) │ 1579 │    0 │  3 │ Doc3
(3 rows)

Disclaimer

Most importantly, none of the above methods will work if the data you just deleted has already been purged by VACUUM or autovacuum. These actively zero out reclaimed space. Restore from backup to get your data back.

Since both pg_dirtyread and pg_surgery operate outside the normal PostgreSQL MVCC machinery, it's easy to create corrupt data using them. This includes duplicated rows, duplicated primary key values, indexes being out of sync with tables, broken foreign key constraints, and others. You have been warned.

pg_dirtyread does not work (yet) if the deleted rows contain any toasted values. Possible other approaches include using pageinspect and pg_filedump to retrieve the ctids of deleted rows.

Please make sure you have working backups and don't need any of the above.

Posted Wed Nov 17 16:46:51 2021 Tags: debian postgresql

The apt.postgresql.org repository has been extended to cover the arm64 architecture.

We had occasionally received user request to add "arm" in the past, but it was never really clear which kind of "arm" made sense to target for PostgreSQL. In terms of Debian architectures, there's (at least) armel, armhf, and arm64. Furthermore, Raspberry Pis are very popular (and indeed what most users seemed to were asking about), but the raspbian "armhf" port is incompatible with the Debian "armhf" port.

Now that most hardware has moved to 64-bit, it was becoming clear that "arm64" was the way to go. Amit Khandekar made it happen that HUAWEI Cloud Services donated a arm64 build host with enough resources to build the arm64 packages at the same speed as the existing amd64, i386, and ppc64el architectures. A few days later, all the build jobs were done, including passing all test-suites. Very few arm-specific issues were encountered which makes me confident that arm64 is a solid architecture to run PostgreSQL on.

We are targeting Debian buster (stable), bullseye (testing), and sid (unstable), and Ubuntu bionic (18.04) and focal (20.04). To use the arm64 archive, just add the normal sources.list entry:

deb https://apt.postgresql.org/pub/repos/apt buster-pgdg main

Ubuntu focal

At the same time, I've added the next Ubuntu LTS release to apt.postgresql.org: focal (20.04). It ships amd64, arm64, and ppc64el binaries.

deb https://apt.postgresql.org/pub/repos/apt focal-pgdg main

Old PostgreSQL versions

Many PostgreSQL extensions are still supporting older server versions that are EOL. For testing these extension, server packages need to be available. I've built packages for PostgreSQL 9.2+ on all Debian distributions, and all Ubuntu LTS distributions. 9.1 will follow shortly.

This means people can move to newer base distributions in their .travis.yml, .gitlab-ci.yml, and other CI files.

Posted Mon May 4 11:20:28 2020 Tags: debian postgresql

Users had often asked where they could find older versions of packages from apt.postgresql.org. I had been collecting these since about April 2013, and in July 2016, I made the packages available via an ad-hoc URL on the repository master host, called "the morgue". There was little repository structure, all files belonging to a source package were stuffed into a single directory, no matter what distribution they belonged to. Besides this not being particularly accessible for users, the main problem was the ever-increasing need for more disk space on the repository host. We are now at 175 GB for the archive, of which 152 GB is for the morgue.

Our friends from yum.postgresql.org have had a proper archive host (yum-archive.postgresql.org) for some time already, so it was about time to follow suit and implement a proper archive for apt.postgresql.org as well, usable from apt.

So here it is: apt-archive.postgresql.org

The archive covers all past and current Debian and Ubuntu distributions. The apt sources.lists entries are similar to the main repository, just with "-archive" appended to the host name and the distribution:

deb https://apt-archive.postgresql.org/pub/repos/apt DIST-pgdg-archive main
deb-src https://apt-archive.postgresql.org/pub/repos/apt DIST-pgdg-archive main

The oldest PostgreSQL server versions covered there are 8.2.23, 8.3.23, 8.4.17, 9.0.13, 9.1.9, 9.2.4, 9.3beta1, and everything newer.

Some example:

$ apt-cache policy postgresql-12
postgresql-12:
  Installed: 12.2-2.pgdg+1+b1
  Candidate: 12.2-2.pgdg+1+b1
  Version table:
 *** 12.2-2.pgdg+1+b1 900
        500 http://apt.postgresql.org/pub/repos/apt sid-pgdg/main amd64 Packages
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
        100 /var/lib/dpkg/status
     12.2-2.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
     12.2-1.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
     12.1-2.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
     12.1-1.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
     12.0-2.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
     12.0-1.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
     12~rc1-1.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
     12~beta4-1.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
     12~beta3-1.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
     12~beta2-1.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages
     12~beta1-1.pgdg+1 500
        500 https://apt-archive.postgresql.org/pub/repos/apt sid-pgdg-archive/main amd64 Packages

Because this is hosted on S3, browsing directories is only supported indirectly by static index.html files, so if you want to look at some specific URL, append "/index.html" to see it.

The archive is powered by a PostgreSQL database and a bunch of python/shell scripts, from which the apt index files are built.

Archiving old distributions

I'm also using the opportunity to remove some long-retired distributions from the main repository host. The following distributions have been moved over:

Debian etch (4.0)
Debian lenny (5.0)
Debian squeeze (6.0)
Ubuntu lucid (10.04)
Ubuntu saucy (13.10)
Ubuntu utopic (14.10)
Ubuntu wily (15.10)
Ubuntu zesty (17.04)
Ubuntu cosmic (18.10)

They are available as "DIST-pgdg" from the archive, e.g. squeeze:

deb https://apt-archive.postgresql.org/pub/repos/apt squeeze-pgdg main
deb-src https://apt-archive.postgresql.org/pub/repos/apt squeeze-pgdg main

Posted Tue Mar 24 12:08:48 2020 Tags: debian postgresql

paste is one of those tools nobody uses [1]. It puts two file side by side, line by line.

One application for this came up today where some tool was called for several files at once and would spit out one line by file, but unfortunately not including the filename.

$ paste <(ls *.rpm) <(ls *.rpm | xargs -r rpm -q --queryformat '%{name} \n' -p)

[1] See "J" in The ABCs of Unix

[PS: I meant to blog this in 2011, but apparently never committed the file...]

Posted Fri Mar 9 10:06:21 2018 Tags: debian unix

After quite some time (years actually) of inactivity as Debian Account Manager, I finally decided to give back that Debian hat. I'm stepping down as DAM. I will still be around for the occasional comment from the peanut gallery, or to provide input if anyone actually cares to ask me about the old times.

Thanks for the fish!

Posted Fri Mar 9 09:58:06 2018 Tags: debian

Now that Salsa is in beta, it's time to import projects (= GitLab speak for "repository"). This is probably best done automated. Head to Access Tokens and generate a token with "api" scope, which you can then use with curl:

$ cat salsa-import
#!/bin/sh

set -eux

PROJECT="${1%.git}"
DESCRIPTION="$PROJECT packaging"
ALIOTH_URL="https://anonscm.debian.org/git"
ALIOTH_GROUP="collab-maint"
SALSA_URL="https://salsa.debian.org/api/v4"
SALSA_GROUP="debian" # "debian" has id 2
SALSA_TOKEN="yourcryptictokenhere"

# map group name to namespace id (this is slow on large groups, see https://gitlab.com/gitlab-org/gitlab-ce/issues/42415)
SALSA_NAMESPACE=$(curl -s https://salsa.debian.org/api/v4/groups/$SALSA_GROUP | jq '.id')

# trigger import
curl -f "$SALSA_URL/projects?private_token=$SALSA_TOKEN" \
  --data "path=$PROJECT&namespace_id=$SALSA_NAMESPACE&description=$DESCRIPTION&import_url=$ALIOTH_URL/$ALIOTH_GROUP/$PROJECT&visibility=public"

This will create the GitLab project in the chosen namespace, and import the repository from Alioth.

Pro tip: To import a whole Alioth group to GitLab, run this on Alioth:

for f in *.git; do sh salsa-import $f; done

(Update 2018-02-04: Query namespace ID via the API)

Posted Mon Dec 25 16:43:30 2017 Tags: debian

About a week ago, I extended vcswatch to also look at tags in git repositories.

Previously, it was solely paying attention to the version number in the top paragraph in debian/changelog, and would alert if that version didn't match the package version in Debian unstable or experimental. The idea is that "UNRELEASED" versions will keep nagging the maintainer (via DDPO) not to forget that some day this package needs an upload. This works for git, svn, bzr, hg, cvs, mtn, and darcs repositories (in decreasing order of actual usage numbers in Debian. I had actually tried to add arch support as well, but that VCS is so weird that it wasn't worth the trouble).

There are several shortcomings in that simple approach:

Some packages update debian/changelog only at release time, e.g. auto-generated from the git changelog using git-dch
Missing or misplaced release tags are not detected

The new mechanism fixes this for git repositories by also looking at the output of git describe --tags. If there are any commits since the last tag, and the vcswatch status according to debian/changelog would otherwise be "OK", a new status "COMMITS" is set. DDPO will report e.g. "1.4-1+2", to be read as "2 commits since the tag [debian/]1.4-1".

Of the 16644 packages using git in Debian, currently 7327 are "OK", 2649 are in the new "COMMITS" state, and 4227 are "NEW". 723 are "OLD" and 79 are "UNREL" which indicates that the package in Debian is ahead of the git repository. 1639 are in an ERROR state.

So far the new mechanism works for git only, but other VCSes could be added as well.

Posted Sun May 29 19:49:28 2016 Tags: debian

I knew it was about this time of the year 10 years ago when my Debian account was created, but I couldn't remember the exact date until I looked it up earlier this evening: today :). Rene Engelhard had been my advocate, and Marc Brockschmidt my AM. Thanks guys!

A lot of time has passed since then, and I've worked in various parts of the project. I became an application manager almost immediately, and quickly got into the NM front desk as well, revamping parts of the NM process which had become pretty bureaucratic (I think we are now, 10 years later, back where we should be, thanks to almost all of the paperwork being automated, thanks Enrico!). I've processed 37 NMs, most of them between 2005 and 2008, later I was only active as front desk and eventually Debian account manager. I've recently picked up AMing again, which I still find quite refreshing as the AM will always also learn new things.

Quality Assurance was and is the other big field. Starting by doing QA uploads of orphaned packages, I attended some QA meetings around Germany, and picked up maintenance of the DDPO pages, which I still maintain. The link between QA and NM is the MIA team where I was active for some years until they kindly kicked me out because I was MIA there myself. I'm glad they are still using some of the scripts I was writing to automate some things.

My favorite MUA is mutt, of which I became co-maintainer in 2007, and later maintainer. I'm still listed in the uploaders field, but admittedly I haven't really done anything there lately.

Also in 2007 I started working at credativ, after having been a research assistant at the university, which meant making my Debian work professional. Of course it also meant more real work and less time for the hobby part, but I was still very active around that time. Later in 2010 I was marrying, and we got two kids, at which point family was of course much more important, so my Debian involvement dropped to a minimum. (Mostly lurking on IRC ;)

Being a PostgreSQL consultant at work, it was natural to start looking into the packaging, so I started submitting patches to postgresql-common in 2011, and became a co-maintainer in 2012. Since then, I've mostly been working on PostgreSQL-related packages, of which far too many have my (co-)maintainer stamp on them. To link the Debian and PostgreSQL worlds together, we started an external repository (apt.postgresql.org) that contains packages for the PostgreSQL major releases that Debian doesn't ship. Most of my open source time at the moment is spent on getting all PostgreSQL packages in shape for Debian and this repository.

According to minechangelogs, currently 844 changelog entries in Debian mention my name, or were authored by me. Scrolling back yields memories of packages that are long gone again from unstable, or I passed on to other maintainers. There are way too many people in Debian that I enjoy(ed) working with to list them here, and many of them are my friends. Debian is really the extended family on the internet. My last DebConf before this year had been in Mar del Plata - I had met some people at other conferences like FOSDEM, but meeting (almost) everyone again in Heidelberg was very nice. I even remembered all basic Mao rules :D.

So, thanks to everyone out there for making Debian such a wonderful place to be!

Posted Sat Sep 5 23:42:13 2015 Tags: debian postgresql