Speed up read.FCS()

In my hands, most current FCS files use float (or double?) representation for intensity. I think that reading their matrix of intensities uses the following call.
https://github.com/RGLab/flowCore/blob/e898b0590f29eaf35ba24c568706e3af17736cc2/R/IO.R#L906-L907
`readBin()` is doing so much stuff that it makes reading a matrix from file very slow. As a workaround, I replaced the matrix reading with a Cpp code. In my workflow, reading and sampling 69 FCS goes from 180 sec to 30 sec. I use `ff <- read.FCS(..., which.lines = 1:5)` to get a flowFrame and all information from a FCS file. Then I get intensities matrix using Cpp code and replace the `exprs(ff)` of the flowFrame with the matrix. Unfortunately, it is sound difficult/dangereous to replace flowCore code above by infering the C file pointer from the R connection.
The following Cpp code works fine for float and should be adapted for double (or integers).
I think the great developpers of flowCore will perform a better job than me.

```
#include <Rcpp.h>
#include <fstream>
#include <stdexcept>

using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix read_fcs_data(
        const std::string& file_path,
        long byte_offset,
        long n_row,
        long n_par,
        bool swap
) {
    // open file in binary mode
    std::ifstream con(file_path, std::ios::binary);
    if (!con.is_open())
        stop("Cannot open file: " + file_path);
    
    // seek to byte offset
    con.seekg(byte_offset, std::ios::beg);
    if (con.fail())
        stop("Failed to seek to offset " + std::to_string(byte_offset));
    
    // read n_vals float32 values into a flat buffer
    int n_vals = n_row * n_par;
    std::vector<float> buf(n_vals);
    con.read(reinterpret_cast<char*>(buf.data()), n_vals * sizeof(float));
    if (con.fail())
        stop("Failed to read " + std::to_string(n_vals) + " values from file");
    
    // swap bytes if file endian differs from host
    if (swap) {
        for (int k = 0; k < n_vals; k++) {
            char* p = reinterpret_cast<char*>(&buf[k]);
            std::swap(p[0], p[3]);
            std::swap(p[1], p[2]);
        }
    }
    
    // fill matrix row-by-row (byrow = TRUE means values are row-major)
    NumericMatrix data_mat(n_row, n_par);
    for (int row = 0; row < n_row; row++) {
        for (int col = 0; col < n_par; col++) {
            data_mat(row, col) = static_cast<double>(buf[row * n_par + col]);
        }
    }
    
    return data_mat;
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up read.FCS() #280

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	readBin(con = con, what = dattype, n = count, size = size,
	signed = signed, endian=endian)

Speed up read.FCS() #280

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions