top of page

Fantastic Binary Objects and how to archive them

Writer's picture: ergempergemp

Updated: May 21, 2022

This problem arises when I was in a meeting with one of my clients. It seems the pdf files (I will call them as binary files from now on) are stored in a database and after a while (let says years) database management becomes very difficult due to the size of the database files. My proposition is to archive the binary files to an object storage.


In this post, I am going to demonstrate a couple of methods can be used to archive the binary files in a Postgresql database to Minio by Trino.


In this blog post I am not going deep into the installation of Postgresql, Minio and Trino and just assume they are all installed and working for good. Just for sake I am going to include the links of the products here.



Create Database Table


Now we are going to start by creating a database table as follows to demonstrate a Spring boot application to upload some binary files.


CREATE TABLE public."document" (
	"text" varchar NULL,
	file bytea NULL,
	file_name varchar NULL,
	content_type varchar NULL
);

Demonstrate File Upload to Postgres



As you can just guess there will be an upload html site and uses a spring boot application as a backend. I used the static directory to host the html file so when the spring boot application runs /static directory contents also be served.


Here is the HTML code for the above GUI.

<!DOCTYPE html>
<html >
<body>

<h1>Spring Boot file upload example</h1>

<form method="POST" action="/api/document/pg" enctype="multipart/form-data">
    <input type="file" name="file" /><br/><br/>
    <input type="text" name="text" /><br/><br/>
    <input type="submit" value="Submit" />
</form>

</body>
</html>


The Html code is quite simple, the important part is the backend service which uploads the selected image to postgresql database. As you can see the POST request goes to api/document/pg which takes the multipart form request and saves the request to postgresql database along with the selected file and the form information.


This part should be quite familiar and one can easily guess that after some extensive usage of this service will fill up the database and expanded database operations like backup/restore and the disk management will start to be a pain.


In the following, I am sharing the controller code and the repository code.

@PostMapping("/api/document/pg")
public ResponseEntity uploadDocumentPg(@RequestParam String text, @RequestPart("file") MultipartFile file) {
//public ResponseEntity uploadDocumentPg(Document document) {

    ResponseEntity retVal = ResponseEntity.status(HttpStatus.OK).body("OK");

    Document document = new Document();

    if (file.isEmpty()) {
        retVal =  ResponseEntity.status(HttpStatus.BAD_REQUEST).body("file is empty");
        return retVal;
    }

    try {
        //byte[] bytes = document.getFile().getBytes();
        document.setText(text);
        document.setFile(file.getBytes());
        document.setFileName(file.getOriginalFilename());
        document.setContentType(file.getContentType());
        documentRepo.addDocumentPg(document);
    }
    catch(Exception ex) {
        ex.printStackTrace();
    }
    finally {
        return retVal;
    }
}

public void addDocumentPg(Document document) {
    jdbcTemplate.update("INSERT INTO document(\"text\", file, file_name, content_type) VALUES (?,?,?,?)",
            document.getText(), document.getFile(), document.getFileName(), document.getContentType());
}

Here you can see the PostMapping in the Spring boot application which basically uploads the file to postgresql table which we just created before.



After a while we may (or must) start thinking the archiving of this table to an object storage. I propose two approached. One is using trino to move postgres table to minio. Or just upload the files to minio, and save the minio path in the postgresql database.


Approach 1



As I have stated before, I assume we have a working instance of a minio server and a trino cluster. Only command you need to run is a simple create table as in the trino instance as follows.


create table minio.default.document as select * from postgres.public.document;

Now your postgresql table is replicated to minio. Just to be careful, I have to warn you about the duration of this command which may take a long time if your table is large and you are trying to move the table at once. It is a much better approach to move table part by part by filtering the select command.


You can see from the same replicated table within the Trino.




Now you can alter your spring boot application repository to read from the trino with the same sql which pulls the objects from postgresql as follows. You can find the related GetMapping and the repository as follows.


@GetMapping("/api/document/trino/{text}")
public ResponseEntity getDocumentTrino(@PathVariable String text) {
    ResponseEntity retVal = ResponseEntity.status(HttpStatus.OK).body("OK");

    try {
        Document document = documentRepo.getDocumentByTextTrino(text);

        retVal =  ResponseEntity.ok().
                header(HttpHeaders.CONTENT_DISPOSITION, "attachment;filename=\"" + document.getFileName() + "\"").
                body(new ByteArrayResource(document.getFile()));
    }
    catch(Exception ex) {
        ex.printStackTrace();
    }
    finally {
        return retVal;
    }
}

public Document getDocumentByTextTrino(String text) {

    Document document = new Document();

    try {
        PreparedStatement pStmt = trinoConfig.getConnection().prepareStatement("select text, file, file_name, content_type from minio.default.document where text = ?");

        pStmt.setString(1, text);
        ResultSet rs = pStmt.executeQuery();

        document = DocumentMapper.map(rs);
    } catch (SQLException e) {
        e.printStackTrace();
    }
    finally {
        return document;
    }
}

Quite easy to switch the backend from postgresql to trino, but this approach has some drawbacks. The archival process should continue because the uplaods are still going into the postgresql database. Thats why the second approach may be much efficient which is uploading the document to minio directly and save the access the path to postgresql.


Approach 2


In order to use this approach Repository for the addDocument should be altered as follows. Following code part is basically upload the document to minio and returns the access path to be able to save to postgresql database.


public String addDocumentMinio(Document document) {

    String retVal = "";
    MinioClient minioClient;

    try {

        minioClient =
                MinioClient.builder()
                        .endpoint(new URL("http://127.0.0.1:9000"))
                        //.endpoint("http://127.0.0.1:9000")
                        .credentials("minioadmin", "minioadmin")
                        .build();

        minioClient.putObject(
                PutObjectArgs.builder().bucket("objectstorage").object(document.getFileName()).stream(
                                new ByteArrayInputStream(document.getFile()), -1, 10485760)
                        .contentType(document.getContentType())
                        .build());

        retVal = "objectstorage-" + document.getFileName();

    } catch (ErrorResponseException e) {
        e.printStackTrace();
    } catch (IllegalArgumentException e) {
        e.printStackTrace();
    } catch (InsufficientDataException e) {
        e.printStackTrace();
    } catch (InternalException e) {
        e.printStackTrace();
    } catch (InvalidKeyException e) {
        e.printStackTrace();
    } catch (InvalidResponseException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    } catch (NoSuchAlgorithmException e) {
        e.printStackTrace();
    } catch (ServerException e) {
        e.printStackTrace();
    } catch (XmlParserException e) {
        e.printStackTrace();
    } catch (Exception ex) {
        ex.printStackTrace();
    }
    finally {
        return retVal;
    }
}
 

You can find the full codes in the following github repo.




Recent Posts

See All

Logical Replication

Main difference of Logical replication in Postgresql is the ability to replicate some tables, instead of replicating the entire database....

Postgresql Replication Series - 3

In this post of the replication, I am going to cover the replication_slots. All the configuration and the infrastructure will be the same...

Comments


github logo

Istanbul, Turkey

  • kisspng-computer-icons-logo-portable-network-graphics-clip-icons-for-free-iconza-circle-so
  • Blogger
  • LinkedIn
Contact

Thanks for submitting!

bottom of page