In my hands, most current FCS files use float (or double?) representation for intensity. I think that reading their matrix of intensities uses the following call.
|
readBin(con = con, what = dattype, n = count, size = size, |
|
signed = signed, endian=endian) |
readBin() is doing so much stuff that it makes reading a matrix from file very slow. As a workaround, I replaced the matrix reading with a Cpp code. In my workflow, reading and sampling 69 FCS goes from 180 sec to 30 sec. I use
ff <- read.FCS(..., which.lines = 1:5) to get a flowFrame and all information from a FCS file. Then I get intensities matrix using Cpp code and replace the
exprs(ff) of the flowFrame with the matrix. Unfortunately, it is sound difficult/dangereous to replace flowCore code above by infering the C file pointer from the R connection.
The following Cpp code works fine for float and should be adapted for double (or integers).
I think the great developpers of flowCore will perform a better job than me.
#include <Rcpp.h>
#include <fstream>
#include <stdexcept>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix read_fcs_data(
const std::string& file_path,
long byte_offset,
long n_row,
long n_par,
bool swap
) {
// open file in binary mode
std::ifstream con(file_path, std::ios::binary);
if (!con.is_open())
stop("Cannot open file: " + file_path);
// seek to byte offset
con.seekg(byte_offset, std::ios::beg);
if (con.fail())
stop("Failed to seek to offset " + std::to_string(byte_offset));
// read n_vals float32 values into a flat buffer
int n_vals = n_row * n_par;
std::vector<float> buf(n_vals);
con.read(reinterpret_cast<char*>(buf.data()), n_vals * sizeof(float));
if (con.fail())
stop("Failed to read " + std::to_string(n_vals) + " values from file");
// swap bytes if file endian differs from host
if (swap) {
for (int k = 0; k < n_vals; k++) {
char* p = reinterpret_cast<char*>(&buf[k]);
std::swap(p[0], p[3]);
std::swap(p[1], p[2]);
}
}
// fill matrix row-by-row (byrow = TRUE means values are row-major)
NumericMatrix data_mat(n_row, n_par);
for (int row = 0; row < n_row; row++) {
for (int col = 0; col < n_par; col++) {
data_mat(row, col) = static_cast<double>(buf[row * n_par + col]);
}
}
return data_mat;
}
In my hands, most current FCS files use float (or double?) representation for intensity. I think that reading their matrix of intensities uses the following call.
flowCore/R/IO.R
Lines 906 to 907 in e898b05
readBin()is doing so much stuff that it makes reading a matrix from file very slow. As a workaround, I replaced the matrix reading with a Cpp code. In my workflow, reading and sampling 69 FCS goes from 180 sec to 30 sec. I useff <- read.FCS(..., which.lines = 1:5)to get a flowFrame and all information from a FCS file. Then I get intensities matrix using Cpp code and replace theexprs(ff)of the flowFrame with the matrix. Unfortunately, it is sound difficult/dangereous to replace flowCore code above by infering the C file pointer from the R connection.The following Cpp code works fine for float and should be adapted for double (or integers).
I think the great developpers of flowCore will perform a better job than me.