Discussion:
Storing ruby serialized objects as text or binary?
Clint Pachl
2012-11-09 09:03:03 UTC
Permalink
I am wondering what are the dis/advantages of storing ruby serialized
objects (i.e. Marshal.dump) directly into a `bytea` column versus Base64
encoding the byte stream and storing into a `text` column? Also, are there
any performance concerns with either method? I am using PostgreSQL to be
specific.

One of the advantages I found in base64 encoding first is that it doesn't
screw up my terminal when Sequel logs to STDOUT. Maybe there is a way to
escape or hide this binary data in the logger? However, it seems storing
the serialized object directly, without encoding, would be more efficient.

(this may make this post irrelevant)
Finally, I was unable to reconstitute a ruby object after an insert/select.
I keep getting the error from Marshal.dump, "data too short". It wasn't
until I base64 encoded first that I got it to work. So maybe storing binary
data directly doesn't work? I found a post from 2008, How do you insert
binary data using sequel + postgresql?<https://groups.google.com/forum/#!searchin/sequel-talk/ruby$20binary/sequel-talk/4aRQGNPQ7Po/_hdJMmZ2Ki4J>


Here's what I did:

ds = DB[:core__checkout_snapshots]
checkout_data = Marshal.dump(data)
ds.insert(id: checkout_id, data: checkout_data)
Marshal.load(ds[id: checkout_id][:data])


This failed with the `data` column as type `bytea` and `text`.
--
You received this message because you are subscribed to the Google Groups "sequel-talk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sequel-talk/-/xc_xJhLlfVoJ.
To post to this group, send email to sequel-talk-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to sequel-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/sequel-talk?hl=en.
Jeremy Evans
2012-11-09 16:51:24 UTC
Permalink
Post by Clint Pachl
I am wondering what are the dis/advantages of storing ruby serialized
objects (i.e. Marshal.dump) directly into a `bytea` column versus Base64
encoding the byte stream and storing into a `text` column? Also, are there
any performance concerns with either method? I am using PostgreSQL to be
specific.
One of the advantages I found in base64 encoding first is that it doesn't
screw up my terminal when Sequel logs to STDOUT. Maybe there is a way to
escape or hide this binary data in the logger? However, it seems storing
the serialized object directly, without encoding, would be more efficient.
(this may make this post irrelevant)
Finally, I was unable to reconstitute a ruby object after an
insert/select. I keep getting the error from Marshal.dump, "data too
short". It wasn't until I base64 encoded first that I got it to work. So
maybe storing binary data directly doesn't work? I found a post from 2008, How
do you insert binary data using sequel + postgresql?<https://groups.google.com/forum/#!searchin/sequel-talk/ruby$20binary/sequel-talk/4aRQGNPQ7Po/_hdJMmZ2Ki4J>
ds = DB[:core__checkout_snapshots]
checkout_data = Marshal.dump(data)
ds.insert(id: checkout_id, data: checkout_data)
Marshal.load(ds[id: checkout_id][:data])
This failed with the `data` column as type `bytea` and `text`.
When storing data in a bytea column, you need to mark it as a blob:

ds.insert(id: checkout_id, data: Sequel.blob(checkout_data))

That may fix your issue.

Personally, I don't think serialization of ruby objects into a database is
a good idea in most cases, but if you have to do it with Marshal, it's
probably better to store it in bytea instead of base64 encoded text.

About your logger issue, use a custom logger that escapes the output
instead of the default Logger class, that's unrelated to Sequel.

Jeremy
--
You received this message because you are subscribed to the Google Groups "sequel-talk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sequel-talk/-/Wpb1JJvKpFEJ.
To post to this group, send email to sequel-talk-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to sequel-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/sequel-talk?hl=en.
Clint Pachl
2012-11-10 09:47:24 UTC
Permalink
Post by Jeremy Evans
Post by Clint Pachl
I am wondering what are the dis/advantages of storing ruby serialized
objects (i.e. Marshal.dump) directly into a `bytea` column versus Base64
encoding the byte stream and storing into a `text` column? Also, are there
any performance concerns with either method? I am using PostgreSQL to be
specific.
One of the advantages I found in base64 encoding first is that it doesn't
screw up my terminal when Sequel logs to STDOUT. Maybe there is a way to
escape or hide this binary data in the logger? However, it seems storing
the serialized object directly, without encoding, would be more efficient.
(this may make this post irrelevant)
Finally, I was unable to reconstitute a ruby object after an
insert/select. I keep getting the error from Marshal.dump, "data too
short". It wasn't until I base64 encoded first that I got it to work. So
maybe storing binary data directly doesn't work? I found a post from 2008, How
do you insert binary data using sequel + postgresql?<https://groups.google.com/forum/#!searchin/sequel-talk/ruby$20binary/sequel-talk/4aRQGNPQ7Po/_hdJMmZ2Ki4J>
ds = DB[:core__checkout_snapshots]
checkout_data = Marshal.dump(data)
ds.insert(id: checkout_id, data: checkout_data)
Marshal.load(ds[id: checkout_id][:data])
This failed with the `data` column as type `bytea` and `text`.
ds.insert(id: checkout_id, data: Sequel.blob(checkout_data))
That may fix your issue.
Yep, that fixed my issue!

I knew the Postgres function I needed to escape the binary data
(i.e. escape_bytea), but I just couldn't find the Sequel method to access
it. I looked at the code, but too many abstraction layers for me to follow.
I'm guessing it is in the C code of the Postgres driver.

Personally, I don't think serialization of ruby objects into a database is
Post by Jeremy Evans
a good idea in most cases, but if you have to do it with Marshal, it's
probably better to store it in bytea instead of base64 encoded text.
I am glad you offered your opinion Jeremy. You made me think about what I
am trying to do. I do indeed want to serialize my objects because they are
short-lived snapshots of a collection of objects. Doing so just makes life
easier. However, I decided that they don't belong in the database. I
decided that I can more easily manage these snapshots using files.

I would also agree that it is better to store a Marshal.dump directly in a
bytea column. For example, upon quick comparison of my snapshots, the
binary data is about about 35% smaller than the base64 encoded text.
Post by Jeremy Evans
About your logger issue, use a custom logger that escapes the output
instead of the default Logger class, that's unrelated to Sequel.
Escaping the binary data using Sequel.blob fixed the garbling of my
terminal. I am in fact using the standard Logger, so no need for a custom
logger. The Postgres escape_bytea function must make the data
terminal-friendly.
--
You received this message because you are subscribed to the Google Groups "sequel-talk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sequel-talk/-/qRJxbbmVkfEJ.
To post to this group, send email to sequel-talk-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to sequel-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/sequel-talk?hl=en.
Clint Pachl
2012-11-10 09:49:43 UTC
Permalink
Post by Jeremy Evans
ds.insert(id: checkout_id, data: Sequel.blob(checkout_data))
That may fix your issue.
Yep, that fixed my issue!

I knew the Postgres function I needed to escape the binary data
(i.e. escape_bytea), but I just couldn't find the Sequel method to access
it. I looked at the code, but too many abstraction layers for me to follow.
I'm guessing it is in the C code of the Postgres driver.

Personally, I don't think serialization of ruby objects into a database is
Post by Jeremy Evans
a good idea in most cases, but if you have to do it with Marshal, it's
probably better to store it in bytea instead of base64 encoded text.
I am glad you offered your opinion Jeremy. You made me think about what I
am trying to do. I do indeed want to serialize my objects because they are
short-lived snapshots of a collection of objects. Doing so just makes life
easier. However, I decided that they don't belong in the database. I
decided that I can more easily manage these snapshots using files.

I would also agree that it is better to store a Marshal.dump directly in a
bytea column. For example, upon quick comparison of my snapshots, the
binary data is about about 35% smaller than the base64 encoded text.
Post by Jeremy Evans
About your logger issue, use a custom logger that escapes the output
instead of the default Logger class, that's unrelated to Sequel.
Escaping the binary data using Sequel.blob fixed the garbling of my
terminal. I am in fact using the standard Logger, so no need for a custom
logger. The Postgres escape_bytea function must make the data
terminal-friendly.
--
You received this message because you are subscribed to the Google Groups "sequel-talk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sequel-talk/-/x7C4DpXNLZAJ.
To post to this group, send email to sequel-talk-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to sequel-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/sequel-talk?hl=en.
Loading...