How did the client get there?
Before diving into the details, let's try to understand how an app might end up in this state. We start with a simple users
table. After a few weeks, we need to be able to determine the last sign in time so we add users.last_sign_in_at
. Then we need to know the user's name. We add first_name
and last_name
. Twitter handle? Another column. GitHub profile? Phone number? After a few months the table becomes mind-boggling.
What's wrong with this?
A table that large indicates several problems:
User
has multiple unrelated responsibilities. This makes it more difficult to understand, change and test.- Exchanging data between the app and the database requires extra bandwidth.
- The app needs more memory to store the bulky model.
The app fetched User
on every request for authentication and authorisation purposes but usually used only a handful of columns. Fixing the problem would improve both the design and performance.
Extracting a table
We can solve the problem by extracting seldom-used columns to a new table (or tables). For example, we can extract profile information (first_name
, etc.) into profiles
with the following steps:
- Create
profiles
with columns duplicating profile-related columns inusers
. - Add
profile_id
tousers
. Set it toNULL
for now. - For each row in
users
, insert a row toprofiles
that duplicates profile-related columns. - Point
profile_id
of the corresponding row inusers
to the row inserted in 3. - Do not make
users.profile_id
non-NULL
. The app isn't aware of its existence yet so it'd break.
We need to replace references to users.first_name
with profiles.first_name
and so on. If we're extracting just a few columns with a handful of references then I recommend we do this manually. But as soon as we catch ourselves thinking "Oh, no. This is the worst job ever!" we should look for an alternative.
Don't neglect the problem. A part of the code that everyone avoids will deteriorate further and suffer from even greater inattention. The easiest way to break the vicious circle is to start small.
Read on, if you're curios how my client solved the problem.
Fixing the code one line at a time
The most incremental approach is fixing one reference to the old column at a time. Let's focus on moving first_name
from users
to profiles
.
First, create Profile
with:
1rails generate model Profile first_name:string
Then add a reference from users
to profiles
and copy users.first_name
to profiles
:
1class ExtractUsersFirstNameToProfiles < ActiveRecord::Migration
2 # Redefine the models to break dependency on production code. We need
3 # vanilla models without callbacks, etc. Also, removing a model in the future
4 # might break the migration.
5 class User < ActiveRecord::Base; end
6 class Profile < ActiveRecord::Base; end
7
8 def up
9 add_reference :users, :profile, index: true, unique: true, foreign_key: true
10
11 User.find_each do |user|
12 profile = Profile.create!(first_name: user.first_name)
13 user.update!(profile_id: profile.id)
14 end
15
16 change_column_null :users, :profile_id, false
17 end
18
19 def down
20 remove_reference :users, :profile
21 end
22end
Because it forces each user to have exactly one profile, a reference from users
to profiles
is preferable to the opposite reference.
With the database structure in place, we can delegate first_name
from User
to Profile
. My client had several requirements:
- Accessors should use the associated
Profile
. They should also log where the deprecated accessor was called from. - Saving
User
should automatically saveProfile
in order to avoid breaking code using the deprecated accessors. User#first_name_changed?
and otherActiveModel::Dirty
methods should still work.
This means User
should look like this:
1class User < ActiveRecord::Base
2 # We need autosave as the client code might be unaware of
3 # Profile#first_name and still reference User#first_name.
4 belongs_to :profile, autosave: true
5
6 def first_name
7 log_backtrace(:first_name)
8 profile.first_name
9 end
10
11 def first_name=(new_first_name)
12 log_backtrace(:first_name)
13
14 # Call super so that User#first_name_changed? and similar still work as
15 # expected.
16 super
17
18 profile.first_name = new_first_name
19 end
20
21 private
22
23 def log_backtrace(name)
24 filtered_backtrace = caller.select do |item|
25 item.start_with?(Rails.root.to_s)
26 end
27 Rails.logger.warn(<<-END)
28A reference to an obsolete attribute #{name} at:
29#{filtered_backtrace.join("\n")}
30END
31 end
32end
After these changes, the app works the same but may be a bit slower because of the extra references to Profile
(if performance becomes an issue just use a tool like AppSignal). The code logs all references to the legacy attributes, even ungreppable ones (e.g. user[attr] = ...
or user.send("#{attr}=", ...)
) so we'll be able to locate all of them even when grep
is unhelpful.
With this infrastructure in place, we can commit to fixing one reference to users.first_name
on a regular schedule, e.g. every morning (to start the day with a quick win) or around noon (to work on something easier after a focused morning). This commitment is essential because our goal is to lessen mental barriers to fixing the issue. Leaving the code above in place without taking action will impoverish the app even further.
After removing all deprecated references (and confirming with grep
and logs) we can finally drop users.first_name
:
1class RemoveUsersFirstName < ActiveRecord::Migration
2 def change
3 remove_column :users, :first_name, :string
4 end
5end
We should also get rid of the code added to User
as it's no longer necessary.
Limitations
The method might apply to your case but keep in mind some of its limitations:
- It doesn't handle bulk queries like
User.update_all
. - It doesn't handle raw SQL queries.
- It may break monkey-patches (remember that dependencies might introduce them too).
User
andProfile
may be out of sync, ifprofiles.first_name
is updated butusers.first_name
is not.
You might be able to overcome some of them. For example, you might keep the models in sync with a service object or a callback on Profile
. Or if you use PostgreSQL you might consider using a materialized view in the interim.
That's it!
The most important lesson of the article is do not avoid code that smells but tackle it head on instead. If the task is overwhelming then work iteratively on a regular schedule. The article presented a method to consider when extracting a table is difficult. If you can't apply it then look for something else. If you have no idea how then just drop me a line. I'll try to help. Don't let your bits rot.