关于gitlab-ce数据库丢失后的修复

渗透技巧 12个月前 admin

386 0 0

简介

在2023年4月24日晚接到gitlab升级通知后，例行升级惹出的大问题。本文记录了修复gitlab的全过程及中间放下的错。由于中间部分操作未能及时记录截图，所以在这使用中间备份的一些场景进行复现。对自己做一个反省。

基本情况：

当前版本 gitlab-ce:15.9.5
升级版本 gitlab-ce:15.11.0
操作系统 Linux debian10 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux
硬件 2c8g50g硬盘
500+仓库和执行任务的13个gitlab-runner

问题出现

照例使用这条命令进行升级重启，这不升级不要紧，一升级就出了大问题。此时为第一个错误，没有对虚拟机进行快照就对内部应用进行升级。由于通过docker部署gitlab已经超过5年了，每次更新都是这么进行的，这次也没有太多要注意的意识。而且作为一个只有2c8g50g硬盘的虚拟机，在gitlab占据了30g的情况下，也没有太多硬盘资源做好实时备份。

docker rm -f docker && docker pull gitlab/gitlab-ce:latest && docker run -it --log-opt max-size=10m --log-opt max-file=3 -p 443:443 --name gitlab --restart always -v /root/gitlab/config:/etc/gitlab -v /root/gitlab/logs:/var/log/gitlab -v /root/gitlab/data:/var/opt/gitlab gitlab/gitlab-ce:latest

第一次报错

比较关键问题日志如下:

```bash
SELECT COUNT(*) FROM container_repositories WHERE migration_state = 'import_skipped'
ERROR:  column "migration_state" does not exist at character 51

If you would like to restart the instance without attempting to
upgrade, add the following to your docker command:
-e GITLAB_SKIP_UNMIGRATED_DATA_CHECK=true

第一次报错

一看，都给你写好了，这不得了吗，直接加上，重启！

关于gitlab-ce数据库丢失后的修复

第二次报错

出我意料，又gg了

Running handlers:
[2023-04-27T15:20:30+00:00] INFO: Running report handlers
Running handlers complete
[2023-04-27T15:20:30+00:00] INFO: Report handlers complete
Infra Phase complete, 35/802 resources updated in 07 seconds
gitlab Reconfigured!
Checking for an omnibus managed postgresql: OK
Checking if postgresql['version'] is set: OK
Checking if we already upgraded: NOT OK
Checking for a newer version of PostgreSQL to install
Upgrading PostgreSQL to 13.8
Checking if PostgreSQL bin files are symlinked to the expected location: OK
Starting the database
Error starting the database. Please fix the error before continuing
Expected process to exit with [0], but received '1'
---- Begin output of gitlab-ctl start postgresql ----
STDOUT: fail: postgresql: runsv not running
STDERR:
---- End output of gitlab-ctl start postgresql ----
Ran gitlab-ctl start postgresql returned 1
Upgrading the existing database failed and was reverted.
Please check the output, and open an issue at:
https://gitlab.com/gitlab-org/omnibus-gitlab/issues
If you would like to restart the instance without attempting to
upgrade, add the following to your docker command:
-e GITLAB_SKIP_PG_UPGRADE=true

加上，然后使用gitlab/gitlab-ce:15.9.5-ce.0容器进行二次重启!

第三次报错

2023-04-27_23:47:06.16695 ts=2023-04-27T23:47:06.166Z caller=repair.go:52 level=error component=tsdb msg="failed to read meta.json for a block during repair process; skipping" dir=/var/opt/gitlab/prometheus/data/01GYSNK1JKD2X5125MP0ZRPKFH err="open /var/opt/gitlab/prometheus/data/01GYSNK1JKD2X5125MP0ZRPKFH/meta.json: permission denied"
2023-04-27_23:47:06.16695 ts=2023-04-27T23:47:06.166Z caller=repair.go:52 level=error component=tsdb msg="failed to read meta.json for a block during repair process; skipping" dir=/var/opt/gitlab/prometheus/data/01GYSWERS0EHQ52Y4RT2GF29Q0 err="open /var/opt/gitlab/prometheus/data/01GYSWERS0EHQ52Y4RT2GF29Q0/meta.json: permission denied"
2023-04-27_23:47:06.16698 ts=2023-04-27T23:47:06.166Z caller=main.go:1144 level=error err="opening storage failed: migrate WAL: check first existing segment: open /var/opt/gitlab/prometheus/data/wal/00017925: permission denied"

gitlab仿佛和我说，你敢二次重启，我就敢三次崩。不过通过log看出来，很简单的嘛，一个权限，回到了最初的起点，gitlab能自己修复权限，所以直接全给777完事

chmod -R 777 /root/gitlab/config
chmod -R 777 /root/gitlab/logs
chmod -R 777 /root/gitlab/data

第三次重启！

第四次报错

==> /var/log/gitlab/gitlab-rails/production.log <==

NoMethodError (undefined method `throttle_unauthenticated_api_enabled' for #<ApplicationSetting id: 1, default_projects_limit: 100000, signup_enabled: false, gravatar_enabled: true, sign_in_text: nil, created_at: "2019-11-03 17:51:42.367909000 +0000", updated_at: "2020-12-03 05:49:41.241233000 +0000", home_page_url: nil, default_branch_protection: 2, help_text: nil, restricted_visibility_levels: [], version_check_enabled: true, max_attachment_size: 10, default_project_visibility: 0, default_snippet_visibility: 0, user_oauth_applications: true, after_sign_out_path: nil, session_expire_delay: 10080, import_sources: ["github", "bitbucket", "bitbucket_server", "gitlab", "google_code", "fogbugz", "git", "gitlab_project", "gitea", "manifest", "phabricator"], help_page_text: nil, shared_runners_enabled: true, max_artifacts_size: 100, runners_registration_token: nil, max_pages_size: 100, require_two_factor_authentication: false, two_factor_grace_period: 48, metrics_enabled: false, metrics_host: "localhost", metrics_pool_size: 16, metrics_timeout: 10, metrics_method_call_threshold: 10, recaptcha_enabled: false, metrics_port: 8089, akismet_enabled: false, metrics_sample_interval: 15, email_author_in_body: false, default_group_visibility: 0, repository_checks_enabled: true, shared_runners_text: nil, metrics_packet_size: 1, disabled_oauth_sign_in_sources: [], health_check_access_token: [FILTERED], container_registry_token_expire_delay: 5, after_sign_up_text: "", user_default_external: false, elasticsearch_indexing: [FILTERED], elasticsearch_search: [FILTERED], repository_storages: ["default"], enabled_git_access_protocol: nil, usage_ping_enabled: false, sign_in_text_html: "", help_page_text_html: "", shared_runners_text_html: "", after_sign_up_text_html: "", rsa_key_restriction: 0, dsa_key_restriction: 0, ecdsa_key_restriction: 0, ed25519_key_restriction: 0, housekeeping_enabled: true, housekeeping_bitmaps_enabled: true, housekeeping_incremental_repack_period: 10, housekeeping_full_repack_period: 50, housekeeping_gc_period: 200, html_emails_enabled: true, plantuml_url: nil, plantuml_enabled: false, shared_runners_minutes: 0, repository_size_limit: 0, terminal_max_session_time: 0, unique_ips_limit_per_user: 10, unique_ips_limit_time_window: 3600, unique_ips_limit_enabled: false, default_artifacts_expire_in: "30 days", elasticsearch_url: [FILTERED], elasticsearch_aws: [FILTERED], elasticsearch_aws_region: [FILTERED], elasticsearch_aws_access_key: nil, geo_status_timeout: 10, uuid:----------------
    product_analytics_configurator_connection_string: nil, openai_api_key: nil>
Did you mean?  throttle_unauthenticated_enabled
               throttle_unauthenticated_enabled=
               throttle_unauthenticated_enabled?
               throttle_unauthenticated_enabled_was
               throttle_authenticated_api_enabled
               throttle_authenticated_api_enabled=
               throttle_authenticated_api_enabled?
               throttle_authenticated_api_enabled_was
               throttle_authenticated_web_enabled
               throttle_authenticated_web_enabled=
               throttle_authenticated_web_enabled?
               throttle_authenticated_api_enabled_change
               throttle_authenticated_web_enabled_was):

很长一段的错误，到了这，我已经放弃了，开始转到另一个错误上。

SELECT COUNT(*) FROM container_repositories WHERE migration_state = 'import_skipped'
ERROR:  column "migration_state" does not exist at character 51

登陆到数据库进行查看发现表里根本没有这一列，而且没有一行数据。

su - gitlab-psql 
psql -h /var/opt/gitlab/postgresql -d gitlabhq_production 
select * from container_repositories;

通过和gitlab官网的建表语句对比(https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/structure.sql)

CREATE TABLE container_repositories (
    id integer NOT NULL,
    project_id integer NOT NULL,
    name character varying NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    status smallint,
    expiration_policy_started_at timestamp with time zone,
    expiration_policy_cleanup_status smallint DEFAULT 0 NOT NULL,
    expiration_policy_completed_at timestamp with time zone,
    migration_pre_import_started_at timestamp with time zone,
    migration_pre_import_done_at timestamp with time zone,
    migration_import_started_at timestamp with time zone,
    migration_import_done_at timestamp with time zone,
    migration_aborted_at timestamp with time zone,
    migration_skipped_at timestamp with time zone,
    migration_retries_count integer DEFAULT 0 NOT NULL,
    migration_skipped_reason smallint,
    migration_state text DEFAULT 'default'::text NOT NULL,
    migration_aborted_in_state text,
    migration_plan text,
    last_cleanup_deleted_tags_count integer,
    delete_started_at timestamp with time zone,
    status_updated_at timestamp with time zone,
    CONSTRAINT check_05e9012f36 CHECK ((char_length(migration_plan) <= 255)),
    CONSTRAINT check_13c58fe73a CHECK ((char_length(migration_state) <= 255)),
    CONSTRAINT check_97f0249439 CHECK ((char_length(migration_aborted_in_state) <= 255))
);

发现从migration_state列开始，就都没了。这时候就犯了第二个错误，想当然的相信了自己家服务器的硬盘能力。但是上头，认为这个问题的出现只是这个表坏了，不行就重建一个嘛，建表语句都有，drop 重新create一个完事，看起来也不是什么重要的表，丢了就丢了呗。登陆上数据库，删库跑路(bushi ~ ~ ~ )

drop table container_repositories;

提示该表有外键，删了会有影响，这时候我哪里还管，删删删！！！

drop table container_repositories cascade;

强行将表删除，然后用上面的语句重建的表，重启！

第五次报错

啪啪啪！！！打脸来的很突然，第五次报错的更加离谱。这里已经完全记不清楚原因了。到这，开始想到，google是人类的精华。Pia!(ｏ ‵-′)ノ”(ノ﹏<。)

google摸了摸我的头说：我不是！

关于gitlab-ce数据库丢失后的修复

不过这有怎么难到我，到处搜gitlab恢复文章，终于google不服有心人。找到了一篇类似的文章(https://tech.uupt.com/?p=147) ,对比后，我得出了如下的结论：

gitlab是挂了
gitlab是数据库出了问题
gitlab是数据库整个出了问题
gitlab是数据库整个没有从data目录中挂载过去导致gitlab启动时认为新程序，已经覆盖了一大部分数据了。
其他文件暂时应该没有问题

想到这，我心拔凉拔凉的，五年的代码都在这里面，五年啊~

冷静下来，我觉得会有如下几个方案供我选择

上策：从原来的环境中完全恢复，数据库丢失通过分析硬盘文件进行恢复，这样能最大程度的保证不出什么问题。
中策：通过迁移git-data下面的所有仓库数据，新建一个gitlab把原来的仓库都搞进去恢复。
下策：全不要了，大不了重头再来！！！

此时，距离案发现场已经过去了3个小时，我意识到，以我50g的硬盘，再加上我不断的重启操作，我那可怜的数据库数据应该是活不全了。万一几百张表里面坏几个，我修也得修死，还不知道会不会隐藏什么神奇的bug。至于下策，想都不要想了，这没了和我的积累没了也差不多了。

那么压力就完全给到了中策上面，想到了恢复又以下几种可能。

通过git-data进行恢复，解析出所有git-data的仓库，然后进行复原。
通过自己平时用的5台电脑的上仓库，私人的gitlab对权限管控比较严格，能提交代码的电脑只有这五台，所以我将这五台上的仓库找到最新的提交上去也不是不行。

但仓库超过500个时，我想了想，我得一个一个的找到他们，还得一个一个的比较5台电脑上不同仓库的时间，这工作量好像有点超出了我能力范围，于是，压力又来到了第一个选项：git-data恢复

在我的设想中，git-data恢复有三种

git-data直接可用
gitlab支持直接导入git-data
通过工具把git-data解析出来

这时候，我满怀希望的看了眼git-data下的仓库文件，他回敬了我一盆冷水

关于gitlab-ce数据库丢失后的修复

全是hash，gitlab从13开始使用hash模式储存代码，到14已经完全废除了原来的代码仓库模式。只觉得bbq就在眼前~~

此路不同，直接转其他路，google开启，对二三进行疯狂搜索，一无所获，这时候，我意识到，我只剩一条路可以走了。

开启仓库寻找之路

git-data目前乱七八糟的，但是他作为一个存储仓库的目录，一定有他的规范

find ./ |grep config

搜索出了一部分config的文件，那就证明，git-data只是一个不带名字的目录而已，而且config内还包括仓库名字，那这事就明朗了。就以下三步：

解析出所有的git仓库
新建一个gitlab，做好基本配置
在gitlab新建仓库
把每个git仓库都push上去
收工

在stackoverflow上找到了提取所有仓库的代码，为了安全起见，我将gitlab文件都备份到了test下面，随便折腾。

for GITDIR in $(find /root/test/gitlab/data/git-data/repositories/@hashed/ -maxdepth 3 -type d -name '*[0-9a-f].git'); do
   echo "$(cat ${GITDIR}/config | grep fullpath | awk -F " = " '{print $2}')   $GITDIR"
done 


xxxxxxx/xxxxxxxxx   /root/test/gitlab/data/git-data/repositories/@hashed/xx/xx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.git
xxxxxxx/xxxxxxxxx   /root/test/gitlab/data/git-data/repositories/@hashed/xx/xx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.git
xxxxxxx/xxxxxxxxx   /root/test/gitlab/data/git-data/repositories/@hashed/xx/xx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.git

ok第一步完工，第二步，建立一个新的gitlab,在这里，我只保留conf目录，其他的都为空目录进去，chmod好相关权限，启动,修改用户密码，配置文字等。

docker run -it --log-opt max-size=10m --log-opt max-file=3 -p 443:443 --name gitlab --restart always -v /root/gitlab/config:/etc/gitlab -v /root/gitlab/logs:/var/log/gitlab -v /root/gitlab/data:/var/opt/gitlab gitlab/gitlab-ce:latest

gitlab-rails console
user = User.where(id:1).first
user.password = 'newpassword'
user.password_confirmation = 'newpassword'
user.save!

第二步也很简单就完成了，接下来就是第三步，把仓库还原进去首先新建一个同名仓库，然后在git-data的该仓库下面试试git push，由于有些仓库较为久远，还是master分支，新库都默认时main分支，所以在此需要测试推送分支。这里又有个不好的习惯，自己写的代码都是master分支，只有极个别的几个仓库采用了其他分支，以后一定改！

cd /root/test/gitlab/data/git-data/repositories/@hashed/xx/xx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.git
git remote add origin ssh://url/user/xx.git
git push --set-upstream origin master || git push --set-upstream origin main

成功，登陆gitlab仓库看，历史记录都在，换个电脑git pull下来，git log，git branch等都和历史一样，这时候，我就知道我稳了。接下来就是批量恢复的阶段了。

500多个仓库，手工建那我手没了，申请一个token，用token批量建立仓库。然后python生成命令进行执行，完美。

for i in s.split('n'):
    t=i.split('/')[1]
    tmp=f'curl --header "PRIVATE-TOKEN: TOKEN" -X POST "https://url/api/v4/projects?name={t}&namespace_id=1"'
    print(tmp)

for tt in mmm:
    t=tt.split()
    t1=t[0].split('/')[1]
    t2=t[1]
    print(f'cd {t2}')
    print(f"git remote add origin ssh://url/user/{t1}.git")
    print("git push --set-upstream origin master || git push --set-upstream origin main")
    print()

最后经过执行，只有一个仓库不知道啥原因，没有name，手工运作一下，over!

gitlab cicd修复

我对于gitlab cicd的依赖很大，历史上大约有300万次的cicd记录，所以gitlab-runner必不可少,还好执行的机器不多，也就7台，所以每个上去看一下，将同名的runner建立起来,也顺便清理了几个不需要的runner。

cat /etc/gitlab-runner/config.toml


sudo gitlab-runner register --url "" -r='' --name="" --tag-list "" --executor "docker" --docker-image "" --docker-pull-policy="if-not-present" --locked=false --run-untagged=true -n
sudo gitlab-runner register --url "" -r='' --name="" --tag-list "" --executor "shell" ----locked=false --run-untagged=true -n

此时没有意识到tag的做法，所以后面根据cicd文件，重新填了下对应runner的tag。

gitlab mail修复

在新的环境中，突然发现gitlab的邮件功能不可用了，gitlab邮件对我的任务提醒包括登陆提示都有很重要的作用。测试发了下邮件，得到如下错误:

gitlab enable_starttls and :tls are mutually exclusive. Set :tls if you're on an SMTPS connection. Set :enable_starttls if you're on an SMTP connection and using STARTTLS for secure TLS upgrade.

google下相关问题，源于配置的冲突，对配置做如下修改

Notify.test_email('xx@xx','email title','email content desc').deliver_now

修改前：

gitlab_rails['smtp_enable_starttls_auto'] = true
gitlab_rails['smtp_tls'] = true
gitlab_rails['smtp_enable_starttls'] = true

修改后：
gitlab_rails['smtp_enable_starttls_auto'] = false
gitlab_rails['smtp_tls'] = true
gitlab_rails['smtp_enable_starttls'] = true


gitlab-ctl reconfigure && gitlab-ctl restart

重启后完美解决问题。

gitlab 备份

既然都吃了这么大一个亏了，那不得多加备份，上备份配置,保留最近7天的备份

gitlab_rails['manage_backup_path'] = true
gitlab_rails['backup_path'] = "/var/opt/gitlab/backups"
gitlab_rails['backup_archive_permissions'] = 0644
gitlab_rails['backup_keep_time'] = 604800

然后执行下gitlab-rake gitlab:backup:create 嘛纳尼！！！这也能报错？？？

{"command":"create","error":"manager: repository empty: repository skipped",-------
省略10w字

2023-04-24 16:30:56 UTC -- Deleting tar staging files ...
2023-04-24 16:30:56 UTC -- Cleaning up /var/opt/gitlab/backups/db
2023-04-24 16:30:56 UTC -- Cleaning up /var/opt/gitlab/backups/repositories
2023-04-24 16:30:56 UTC -- Deleting tar staging files ... done
2023-04-24 16:30:56 UTC -- Deleting backups/tmp ...
2023-04-24 16:30:56 UTC -- Deleting backups/tmp ... done
2023-04-24 16:30:56 +0000 -- Deleting backup and restore lock file
rake aborted!

- 下面这个可能存在
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR:  relation "design_management_repositories" does not exist
LINE 8:  WHERE a.attrelid = '"design_management_repositories"'::regc...
- 上面这个可能存在
                            ^
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/activerecord-6.1.7.2/lib/active_record/connection_adapters/postgresql/database_statements.rb:19:in `exec'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/activerecord-6.1.7.2/lib/active_record/connection_adapters/postgresql/database_statements.rb:19:in `block (2 levels) in query'

通过查询官方文档(https://docs.gitlab.cn/ee/raketasks/backup_restore.html) 在gitlab12.1之前使用gitlab-rake gitlab:backup:create 12.2以后使用 sudo gitlab-backup create 命令进行备份，换了命令，搞定。通过crontab进行定时备份

0 5 * * * bash /root/gitlab/bak.sh >> /var/gitlab_bak.log

至此，gitlab恢复，五年的代码都回来了。

总结

至此，经历了两个晚上终于完成了对gitlab的恢复，也不由得写下此文以作纪念。

做升级前要做好备份，不可随意升级，哪怕这个服务器跑了2年没关过，这个系统升级了2年没出问题
出现问题的第一件事，就是google相同问题，并且保存现场。
对于重要的数据备份最好做到多地多备。
对于没把握的事，要慎重点，不然损失难料。

招新小广告

ChaMd5 Venom 招收大佬入圈

新成立组IOT+工控+样本分析长期招新

欢迎联系[email protected]

原文始发于微信公众号（ChaMd5安全团队）：关于gitlab-ce数据库丢失后的修复

版权声明：admin 发表于 2023年4月30日上午8:00。
转载请注明：关于gitlab-ce数据库丢失后的修复 | CTF导航

Docker 枚举、特权升级和容器逃逸 (DEEPCE)

admin

350

记录一下代白帽子申请CVE的过程

admin

1,396

公有云SDN的安全风险与加固

admin

411

实战|记一次艰难的SQL注入(过安全狗)

admin

767

不装了，我摊牌了

admin

504

DEFCON议题解读｜Dll劫持新思路——修改环境变量

admin

555

暂无评论

您必须登录才能参与评论！

立即登录

暂无评论...

关于gitlab-ce数据库丢失后的修复

简介

问题出现

第一次报错

第二次报错

第三次报错

第四次报错

第五次报错

开启仓库寻找之路

gitlab cicd修复

gitlab mail修复

gitlab 备份

总结

Java安全之Velocity模板注入漏洞

关于SIM卡取证的分析学习

相关文章

暂无评论

相关文章

关于gitlab-ce数据库丢失后的修复

简介

问题出现

第一次报错

第二次报错

第三次报错

第四次报错

第五次报错

开启仓库寻找之路

gitlab cicd修复

gitlab mail修复

gitlab 备份

总结

Java安全之Velocity模板注入漏洞

关于SIM卡取证的分析学习

相关文章

暂无评论

广告位

相关文章