5 Data wrangling
Once data have been spike sorted, we are ready to begin working in
R
. To get to a point where meaningful preliminary plots can be
produced, a few things need to be addressed:
Labeling the time series of spike & photodiode data based on the stimulus that appears on screen (i.e., matching the log file to the data file). This includes labeling phases (like “blank”, “stationary”, and “moving”) along with experimental metadata such as SF, TF, and stimulus orientation (direction).
Re-organizing the spike & photodiode so that separate replicates of a stimulus run are uniquely labelled and then arranged together.
Binning the data into 10- and 100-ms data sets, and then exporting CSVs of the unbinned, 10-ms-binned, and 100-ms-binned data. This will be highly useful for situations where you are handling multiple data files (e.g., from different recording days), in which case it is likely that your machine will not be able to store all objects in RAM.
Before proceeding any further, please ensure you have installed and loaded all the necessary
R
packages as detailed in the Preface section of this guide.
5.1 Import example file and metadata
We will use a recently-collected data file and its corresponding metadata file to showcase the fundamentals of wrangling ephys data into a more easily plot-able format.
5.1.1 Identify files to import
The following code is based on the assumptions that:
Your files are stored in a directory entitled
/data
The basename of each file (i.e., the name of the file, excluding the file extension) is identical for each set of spike sorted data and corresponding metadata log file (e.g.,
2023-02-16_001.mat
and2023-02-16_001.csv
have the same basename, which is2023-02-16_001
).
## List all files of each file type
<-
csv_files list.files("./data", pattern = ".csv",
full.names = TRUE)
<-
mat_files list.files("./data", pattern = ".mat",
full.names = TRUE)
## Generate metadata tibbles for each file type
<-
csv_file_info tibble(
csv_files = csv_files,
## Extract the basename by removing the file extension
basename = basename(csv_files) %>% str_remove(".csv"),
## NOTE: PLEASE ADJUST THE FOLLOWING LINE TO BE ABLE TO EXCTRACT OUT THE
## DATE BASED ON YOUR NAMING CONVENTION
basedate = basename(csv_files) %>% str_sub(start = 1, end = 12)
)<-
mat_file_info tibble(
mat_files = mat_files,
## Extract the basename by removing the file extension
basename = basename(mat_files) %>% str_remove(".mat"),
## NOTE: AGAIN, PLEASE ADJUST THE FOLLOWING LINE TO BE ABLE TO EXCTRACT OUT
## THE DATE BASED ON YOUR NAMING CONVENTION
basedate = basename(mat_files) %>% str_sub(start = 1, end = 12)
)
## Matchmake between .MAT data and .CSV log files
<-
csv_mat_filejoin inner_join(csv_file_info, mat_file_info, by = "basename") %>%
## OPTIONAL STEP: remove any rows where either the .MAT or .CSV is missing
drop_na()
## Store a vector of basenames in the environment. This will become useful later
<- csv_mat_filejoin$basename
base_names
## Your end products from this code block should look something like:
csv_mat_filejoin
## # A tibble: 1 × 5
## csv_files basename basedate.x mat_files based…¹
## <chr> <chr> <chr> <chr> <chr>
## 1 ./data/2023-02-16_001.csv 2023-02-16_001 2023-02-16_0 ./data/2023-02-… 2023-0…
## # … with abbreviated variable name ¹basedate.y
## and:
base_names
## [1] "2023-02-16_001"
5.1.2 Data import and preliminary labeling
We will now use the R.matlab
package to import the .mat
file into
R and then label the spike and photodiode time series based on the
information in the .csv
log file
Because .mat
files can be large, data import can take several
minutes.
Please see in-line comments for further guidance
## Set up empty vectors that will collect sets of replicates that we will be
## splitting up
<- NULL
metadata_sets <- NULL
meta_splits <- NULL
data_splits gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 2454829 131.2 3878522 207.2 3878522 207.2
## Vcells 4234160 32.4 10146329 77.5 6629589 50.6
<- Sys.time() ## Optional, will help you assess run time
starttime for (i in 1:nrow(csv_mat_filejoin)) {
## Which file # are we working on?
print(i)
## Set up temporary objects in which to eventually write data
<- NULL
csv_data_sets <- NULL
mat_data_sets <- NULL
joined_data_sets
## Import the matlab file. This may take some time
<-
mat_import ::readMat(csv_mat_filejoin[i,"mat_files"])
R.matlab
## Read in the corresponding csv log file
<-
csv_data_sets[[i]] read_csv(as.character(csv_mat_filejoin[i,"csv_files"]),
show_col_types = FALSE) %>%
## Rename columns for convenience
rename(
Spatial_Frequency = `Spatial Frequency`,
Temporal_Frequency = `Temporal Frequency`,
Cycles_Per_Pixel = `Cycles Per Pixel`
)
## Determine if spatial frequencies need to be transformed
<- csv_data_sets[[i]]$Spatial_Frequency %>% unique() %>% sort()
sfs <- c(0.000668, 0.001336, 0.002670, 0.005300, 0.010600, 0.021200)
cpps if (all(sfs == cpps)) {
## If true, convert to cpd using the following mapping:
$Spatial_Frequency[csv_data_sets[[i]]$Spatial_Frequency == 0.000668] <-
csv_data_sets[[i]]2^-6
$Spatial_Frequency[csv_data_sets[[i]]$Spatial_Frequency == 0.001336] <-
csv_data_sets[[i]]2^-5
$Spatial_Frequency[csv_data_sets[[i]]$Spatial_Frequency == 0.00267] <-
csv_data_sets[[i]]2^-4
$Spatial_Frequency[csv_data_sets[[i]]$Spatial_Frequency == 0.0053] <-
csv_data_sets[[i]]2^-3
$Spatial_Frequency[csv_data_sets[[i]]$Spatial_Frequency == 0.0106] <-
csv_data_sets[[i]]2^-2
$Spatial_Frequency[csv_data_sets[[i]]$Spatial_Frequency == 0.0212] <-
csv_data_sets[[i]]2^-1
}
## The log file does not have time = 0, so set up a separate tibble to
## add this info in later. Some of the metadata will just be filler for now.
<- tibble(
initial Trial = "initialization",
Spatial_Frequency = csv_data_sets[[i]]$Spatial_Frequency[1],
Cycles_Per_Pixel = csv_data_sets[[i]]$Cycles_Per_Pixel[1],
Temporal_Frequency = csv_data_sets[[i]]$Temporal_Frequency[1],
Direction = csv_data_sets[[i]]$Direction[1],
Time = 0.000
)
## Find photodiode
## It is almost always in channel 2, but we should be sure to check before
## extracting automatically
<-
photod_default_channel ::str_which(names(mat_import), "Ch2")[1]]]
mat_import[[stringrif (!photod_default_channel[1][[1]][1] == "waveform") {
warning("File ", i,": Photodiode channel identity uncertain")
}
## Find spikes
## Similarly, spikes are almost always in channel 5, but we should check
<-
spikes_default_channel ::str_which(names(mat_import), "Ch5")[1]]]
mat_import[[stringrif("codes" %not_in% attributes(spikes_default_channel)$dimnames[[1]]) {
warning("File ", i,": Sorted spikes channel identity uncertain")
}## If that worked, see if we can automatically determine the "times" and
## "codes" slot numbers
<-
times_location which(attributes(spikes_default_channel)$dimnames[[1]] == "times")
<-
codes_location which(attributes(spikes_default_channel)$dimnames[[1]] == "codes")
## Find matlab's stimulus change log
<-
stim_change_channel ::str_which(names(mat_import), "Ch3")[1]]]
mat_import[[stringr## Each sweep should be 5 secs. We'll check that the median is 5
## If this results in an error, then the channel identity could be wrong, or
## there may have been an issue with sweep duration during the recording
## process
if(!median(round(diff(stim_change_channel[[5]][,1]),0)) == 5) {
warning("File ", i,": stim change channel identity uncertain")
}
## Determine when the onset of motion occurred according to matlab
<-
first_moving_mat 5]][,1][1]
stim_change_channel[[## Find the first "moving" phase in the log file
<-
first_moving_csv %>%
csv_data_sets[[i]] filter(Trial == "moving") %>%
select(Time) %>%
slice(1) %>%
as.numeric()
## Find the first "blank" phase in the log file
<-
first_blank %>%
csv_data_sets[[i]] filter(Trial == "blank") %>%
select(Time) %>%
slice(1) %>%
as.numeric()
## Compute the difference between these two
<- first_moving_csv - first_blank
first_mvbl_diff
## Check to see if the final row of the metadata is "moving" and truncate
## if not
## This can effectively be done by truncating after the final "moving" phase
<-
max_moving_sweep max(which(csv_data_sets[[i]]$Trial == "moving"))
<-
first_csv_tmp bind_rows(initial, csv_data_sets[[i]]) %>%
## Add 1 to max moving sweep since we tacked on "initial" in previous step
## Then slice to restrict any extraneous partial sweeps
slice_head(n = (max_moving_sweep + 1)) %>%
## Add the first event time to "Time" and subtract first_mvbl_diff (~2 secs)
## What this does is shift the log csv's time stamping to match the matlab
## file's stim change channel's time stamping
mutate(Time = Time + first_moving_mat - first_mvbl_diff - first_blank) %>%
## Make character version of Time for joining later
## This will be crucial for _join functions
mutate(Time_char = as.character(round(Time,3)))
## Duplicate the initialization for ease of setting T0
<-
inception %>%
initial mutate(Time_char = as.character(round(Time,3)))
$Trial[1] <- "inception"
inception
## Bind the initialization rows
<-
first_csv bind_rows(inception, first_csv_tmp)
## Compute stimulus end times
$Stim_end <- c(first_csv$Time[-1], max(first_csv$Time) + 3)
first_csv
## Get final time
<- first_csv$Stim_end[nrow(first_csv)]
final_time
## Extract photodiode data
## First generate a time sequence to match to the photodiode trace
<- seq(
Time_vec from = 0.0000,
by = 1 / 25000,
length.out = length(photod_default_channel[9][[1]][, 1])
)## The key thing is to get a character version of time from this
<- as.character(round(Time_vec, 3))
Time_char_vec
## Grab the photodiode data
<-
photod_full tibble(Photod =
9][[1]][, 1])
photod_default_channel[## Add numeric time
$Time <-
photod_fullseq(
from = 0.0000,
by = 1 / 25000,
length.out = nrow(photod_full)
)options(scipen = 999)
<-
photod_full %>%
photod_full ## Add the character time
add_column(Time_char = Time_char_vec) %>%
## Use the charcter time to define a group
group_by(Time_char) %>%
## Then average the photodiode within
summarise(Photod = mean(Photod)) %>%
ungroup() %>%
mutate(Time = round(as.numeric(Time_char), 3)) %>%
arrange(Time) %>%
filter(Time <= final_time)
## Extract all spike data
<-
all_spike_dat tibble(
Time =
1]][, 1],
spikes_default_channel[times_location][[code =
1]][, 1]) %>%
spikes_default_channel[codes_location][[## Characterize time, for purposes of joining later
mutate(Time_char = as.character(round(Time, 3)))
## How many distinct neurons are there?
<- sort(unique(all_spike_dat$code))
cell_ids <- 1:length(cell_ids)
n_cells
if(length(n_cells) > 1) { ## if there's more than one distinct neuron
<-
all_spike_dat_tmp %>%
all_spike_dat ## Group by identity of spiking neuron
group_by(code) %>%
## Split into separate dfs, one per neuron
group_split()
## Additional cells are labeled as "Spike_n"
<- NULL
all_cells for (j in n_cells) {
#print(j)
= paste0("Spikes_", cell_ids[j])
new_name <-
all_cells[[j]]
all_spike_dat_tmp[[j]]
## Consolidate to 3 decimal places
<-
all_cells[[j]] %>%
all_cells[[j]] group_by(Time_char) %>%
summarise(code = mean(code)) %>%
mutate(code = ceiling(code)) %>%
ungroup() %>%
mutate(Time = round(as.numeric(Time_char), 3)) %>%
arrange(Time) %>%
filter(Time <= final_time)
names(all_cells[[j]])[match("code", names(all_cells[[j]]))] <-
new_name## Replace "j" with 1 to indicate presence/absence of spike rather than
## cell identity
<- 1
all_cells[[j]][new_name]
## If the identity is 1, replace "Spikes_1" with just "Spikes"
if (new_name == "Spikes_1") {
names(all_cells[[j]])[match(new_name, names(all_cells[[j]]))] <-
"Spikes"
}
}
## Consolidate
<-
all_spike_dat %>%
all_cells ## Tack on additional spike columns
reduce(full_join, by = "Time_char") %>%
arrange(Time_char) %>%
## Remove time.n columns but
## Do not remove Time_char
select(-contains("Time.")) %>%
## Regenerate numeric time
mutate(
Time = as.numeric(Time_char)
%>%
) select(Time, Time_char, Spikes, everything()) %>%
filter(Time <= final_time)
else { ## If there's only 1 neuron
} <-
all_spike_dat %>%
all_spike_dat group_by(Time_char) %>%
summarise(code = mean(code)) %>%
mutate(code = ceiling(code)) %>%
ungroup() %>%
rename(Spikes = code) %>%
mutate(Time = round(as.numeric(Time_char), 3)) %>%
arrange(Time) %>%
filter(Time <= final_time) %>%
select(Time, Time_char, everything())
}
options(scipen = 999)
<-
mat_data_sets[[i]] ## Generate a time sequence from 0 to final_time
tibble(
Time = seq(from = 0, to = final_time, by = 0.001)
%>%
) ## Character-ize it
mutate(Time_char = as.character(round(Time, 5))) %>%
## Join in the photodiode data
left_join(photod_full, by = "Time_char") %>%
select(-Time.y) %>%
rename(Time = Time.x) %>%
arrange(Time) %>%
## Join in the spike data
left_join(all_spike_dat, by = "Time_char") %>%
select(-Time.y) %>%
rename(Time = Time.x) %>%
arrange(Time) %>%
filter(Time <= final_time) %>%
## Replace NAs with 0 in the Spike columns only
mutate(
across(starts_with("Spikes"), ~replace_na(.x, 0))
)
## Merge the matlab data with the metadata
<-
joined_one_full %>%
mat_data_sets[[i]] ## Join by the character version of time NOT the numerical!!
full_join(first_csv, by = "Time_char") %>%
## Rename columns for clarity of reference
rename(Time_mat = Time.x,
Time_csv = Time.y) %>%
## Convert character time to numeric time
mutate(Time = round(as.numeric(Time_char), 3)) %>%
## Carry metadata forward
mutate(
Trial = zoo::na.locf(Trial, type = "locf"),
Spatial_Frequency = zoo::na.locf(Spatial_Frequency, type = "locf"),
Cycles_Per_Pixel = zoo::na.locf(Cycles_Per_Pixel, type = "locf"),
Temporal_Frequency = zoo::na.locf(Temporal_Frequency, type = "locf"),
Direction = zoo::na.locf(Direction, type = "locf"),
Time_csv = zoo::na.locf(Time_csv, type = "locf"),
Stim_end = zoo::na.locf(Stim_end, type = "locf")
%>%
) ## Calculate velocity
mutate(
Speed = round(Temporal_Frequency/Spatial_Frequency, 0),
Log2_Speed = log2(Speed)
)
## Add info to metadata
<-
metadata_one_full %>%
first_csv mutate(
Speed = round(Temporal_Frequency/Spatial_Frequency, 0),
Log2_Speed = log2(Speed),
Stim_end_diff = c(0, diff(Stim_end))
)
## Some quality control checks
## What are the stim time differences?
<- round(metadata_one_full$Stim_end_diff)[-c(1:2)]
stimtime_diffs ## How many total reps were recorded?
<- length(stimtime_diffs)/3
stimtime_reps ## What do we expect the overall structure to look like?
<- rep(c(1,1,3), stimtime_reps)
stimtime_expectation ## Does reality match our expectations?
if (!all(stimtime_diffs == stimtime_expectation)) {
## If you get this, investigate the file further and determine what went
## wrong
print("stimtime issue; investigate")
}
## Sometimes the final sweep gets carried for an indefinite amount of time
## before the investigator manually shuts things off. The following block
## truncates accordingly
<-
mark_for_removal which(round(metadata_one_full$Stim_end_diff) %not_in% c(1, 3))
if (any(mark_for_removal == 1 | mark_for_removal == 2)) {
<- mark_for_removal[mark_for_removal > 2]
mark_for_removal
}if (length(mark_for_removal) > 0) {
<-
metadata_sets[[i]] %>%
metadata_one_full filter(Time < metadata_one_full[mark_for_removal[1],]$Time)
<-
joined_data_sets[[i]] %>%
joined_one_full filter(Time_mat < metadata_one_full[mark_for_removal[1],]$Time)
else {
} <-
metadata_sets[[i]]
metadata_one_full<-
joined_data_sets[[i]]
joined_one_full
}
## Organize the metadata for export in the R environment
<-
meta_splits[[i]] %>%
metadata_sets[[i]] ## Get rid of the non-sweep info
filter(!Trial == "inception") %>%
filter(!Trial == "initialization") %>%
## Group by trial
#group_by(Spatial_Frequency, Temporal_Frequency, Direction) %>%
## Split by trial group
group_split(Spatial_Frequency, Temporal_Frequency, Direction)
<-
data_splits[[i]] %>%
joined_data_sets[[i]] ## Get rid of the non-sweep info
filter(!Trial == "inception") %>%
filter(!Trial == "initialization") %>%
## Group by trial
group_by(Spatial_Frequency, Temporal_Frequency, Direction) %>%
## Split by trial group
group_split()
## Do some cleanup so large objects don't linger in memory
rm(
first_csv, inception, initial, mat_import, first_csv_tmp,
photod_default_channel, spikes_default_channel, photod_full,
all_spike_dat, all_spike_dat_tmp, all_cells,
metadata_one_full, joined_one_full, joined_data_sets,
csv_data_sets, mat_data_sets
)message("File ", i, ": ", csv_mat_filejoin[i,"basename"], " imported")
gc()
}
## [1] 1
<- Sys.time()
endtime - starttime ## Total elapsed time endtime
## Time difference of 2.67673 mins
## Tidy up how R has been using RAM by running garbage collection
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 5596189 298.9 16196008 865.0 25306262 1351.6
## Vcells 204523381 1560.4 915557392 6985.2 1430558425 10914.3
## Name each data set according to the basename of the file
names(metadata_sets) <- csv_mat_filejoin$basename #base_names
names(meta_splits) <- csv_mat_filejoin$basename #base_names
names(data_splits) <- csv_mat_filejoin$basename #base_names
## Get organized lists of stimuli that were used
## This will ultimately be used for arranging data by stimuli in a sensible
## order
<- NULL
metadata_combos for (i in 1:length(metadata_sets)) {
<-
metadata_combos[[i]] %>%
metadata_sets[[i]] ## Get unique stimulus parameters
distinct(Spatial_Frequency, Temporal_Frequency, Speed, Direction) %>%
arrange(Direction) %>%
## Sort by SF (smallest to largest)
arrange(desc(Spatial_Frequency)) %>%
mutate(
## Set a "plot_order" which provides a running order of stimuli
plot_order = 1:length(meta_splits[[i]]),
name = paste(Direction, "Deg,", Speed, "Deg/s")
)
}names(metadata_combos) <- csv_mat_filejoin$basename #base_names
What we now have is the following:
metadata_sets
: contains alist
oftibble
s (one per imported file), each of which comprises the stimulus logmetadata_combos
: contains alist
oftibble
s (one per imported file), each of which comprises the stimulus log re-written in a preferred plotting ordermeta_splits
: alist
oflist
s. The first level of thelist
corresponds to the file identities. Within each file’slist
, there is an additional set oflist
s, each of which contains atibble
with the log info for a specific stimulus combination. Essentially the same asmetadata_sets
but split up on a per-stimulus basis.data_splits
: anotherlist
oflist
s, arranged in the same hierarchical order asmeta_splits
. Each tibble herein contains the spike and photodiode data from thematlab
file on a per-stimulus basis.
Note that since we are only using one example file, each of these lists should only be 1 element long (with the latter two having additional elements within the first element)
5.2 Organizing replicates (required) and binning (optional)
It is common to record several different sweeps of a stimulus to collect (somewhat) independent replicates of neural responses. The primary task of this section will be to use the lists from the previous section to gather & label replicates of the same stimulus.
A secondary task is to deal with binning, if desired (highly recommended). Depending on the goals of plotting and/or analyses, it may be wise to work with either unbinned spike data or to bin the data at a convenient and sensible interval. I generally choose to work at the following levels:
- Unbinned
- Bin size = 10 ms
- Bin size = 100 ms
Important note: Rather than provide separate code for each of these 3 scenarios, I will provide one example. Bin size must be declared beforehand – please pay attention.
## Set bin size here
## Units are in ms (e.g. 10 = 10ms)
= 10 ## 10 or 100 or 1 (1 = "unbinned")
bin_size
= NULL
slice_size = NULL
slicemin = NULL
slicemax = NULL
condition if (bin_size == 10){
<- 501
slice_size <- 202
slicemin <- 498
slicemax <- "_binsize10"
condition else if (bin_size == 100){
} <- 51
slice_size <- 21
slicemin <- 49
slicemax <- "_binsize100"
condition else if (bin_size == 1){
} <- NULL
slice_size <- NULL
slicemin <- NULL
slicemax <- "_unbinned"
condition else {
} stop("bin_size is non-standard")
}
<-
all_replicate_data_reorganized vector(mode = "list", length = length(meta_splits))
<-
name_sets vector(mode = "list", length = length(meta_splits))
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 5596004 298.9 16196008 865.0 25306262 1351.6
## Vcells 204513787 1560.4 732445914 5588.2 1430558425 10914.3
<- Sys.time()
starttime for (i in 1:length(meta_splits)){
## i = file number
print(i)
## We'll need to collect data at a per-stimulus level and on a per-replicate
## level within the per-stimulus level
## "j" will be used to designate a unique stimulus
## We'll first create an empty object in which to collect stimulus-specific
## data
<- NULL
replicate_data_reorganized ## For each of j unique stimuli...
for (j in 1:length(meta_splits[[i]])) { # j = {direction,speed}
## Isolate the j-th data
<- data_splits[[i]][[j]]
d ## And the j-th log data
<- meta_splits[[i]][[j]] %>%
m group_by(Trial) %>%
## Label separate replicates
mutate(Replicate = row_number())
## Extract a stimulus label to a name_set that will be used later
<-
name_sets[[i]][[j]] paste(m$Direction[1], "Deg,", m$Speed[1], "Deg/s")
## Set up a temporary object to deal with per-replicate data
<- NULL
replicates_ordered ## "k" will be used to designate replicate number
for (k in 1:max(m$Replicate)){
<-
tmp %>%
m filter(Replicate == k)
## If you have a complete replicate (i.e., blank, stationary, moving)
if (nrow(tmp) == 3 ) {
## Grab the specific data
<-
doot %>%
d filter(Time >= min(tmp$Time)) %>%
filter(Time <= max(tmp$Stim_end))
## Add bin information
$bin <-
dootrep(1:ceiling(nrow(doot)/bin_size), each = bin_size)[1:nrow(doot)]
if (bin_size == 1) {
## IF YOU ARE NOT BINNING, RUN THIS:
<-
replicates_ordered[[k]] %>%
doot mutate(
## Construct a standardized time within the sweep
Time_stand = Time_mat - min(Time_mat),
## When does the sweep begin
Time_begin = min(Time_mat),
## When does the sweep end
Time_end = max(Time_mat),
## Delineate the end of the blank phase
Blank_end = tmp$Stim_end[1] - min(Time_mat),
## Delineate the end of the stationary phase
Static_end = tmp$Stim_end[2] - min(Time_mat),
## Label the replicate number
Replicate = k
%>%
) ## Bring stim info to first few columns
select(Speed, Spatial_Frequency, Temporal_Frequency, Direction,
everything()) %>%
## Just in case there some hang over
filter(Time_stand >= 0)
else { ## IF YOU ARE BINNING, RUN THIS:
}
## First grab time and meta info
<-
time_and_meta %>%
doot ## WITHIN EACH BIN:
group_by(bin) %>%
summarise(
## Label the trial
Trial = first(Trial),
## Midpoint of bin
Time_bin_mid = mean(Time_mat),
## Bin beginning
Time_bin_begin = min(Time_mat),
## Bin end
Time_bin_end = max(Time_mat),
## Spike rate = sum of spikes divided by elapsed time
Spike_rate = sum(Spikes) / (max(Time_mat) - min(Time_mat)),
Photod_mean = mean(Photod)
)
## Now deal with Spike_N columns
<-
hold_spike_n %>%
doot select(starts_with("Spikes_")) %>%
add_column(bin = doot$bin) %>%
add_column(Time_mat = doot$Time_mat) %>%
group_by(bin) %>%
summarise(across(starts_with("Spikes_"),
~ sum(.x) / (max(Time_mat) - min(Time_mat))))
## Put them together
<-
replicates_ordered[[k]] %>%
time_and_meta left_join(hold_spike_n, by = "bin") %>%
mutate(
## Add in metadata (following same definitions above)
Time_stand = Time_bin_mid - min(Time_bin_mid),
Blank_end = tmp$Stim_end[1] - min(Time_bin_mid),
Static_end = tmp$Stim_end[2] - min(Time_bin_mid),
Spatial_Frequency = m$Spatial_Frequency[1],
Temporal_Frequency = m$Temporal_Frequency[1],
Speed = m$Speed[1],
Direction = m$Direction[1],
Replicate = k
%>%
) ## Bring stim info to first few columns
select(Speed, Spatial_Frequency, Temporal_Frequency, Direction,
everything()) %>%
## Just in case there some hang over
filter(Time_stand >= 0) %>%
filter(bin < slice_size + 1)
rm(time_and_meta, hold_spike_n)
}
}
}
## Now insert it within the collector of per-stimulus data
<-
replicate_data_reorganized[[j]] %>%
replicates_ordered bind_rows()
## Now insert it within the overall data collector
<-
all_replicate_data_reorganized[[i]][[j]]
replicate_data_reorganized[[j]]
## Toss out temporary objects and clean up
rm(replicates_ordered, d, m, tmp)
gc()
} }
## [1] 1
<- Sys.time()
endtime - starttime endtime
## Time difference of 2.527167 mins
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 5600940 299.2 16196008 865.0 25306262 1351.6
## Vcells 208863355 1593.6 585956732 4470.5 1430558425 10914.3
for (i in 1:length(all_replicate_data_reorganized)) {
for (j in 1:length(all_replicate_data_reorganized[[i]])) {
names(all_replicate_data_reorganized[[i]])[[j]] <- name_sets[[i]][[j]]
}
}names(all_replicate_data_reorganized) <- csv_mat_filejoin$basename #base_names
At the end of this process, here is how all_replicate_data_reorganized[[1]]
should look:
## How long is this list?
length(all_replicate_data_reorganized[[1]])
## [1] 80
## This object contains 48 individual tibbles. Here's an example of one
## pulled at random
1]][sample(1:48, size = 1)] all_replicate_data_reorganized[[
## $`45 Deg, 128 Deg/s`
## # A tibble: 3,507 × 16
## Speed Spatial_F…¹ Tempo…² Direc…³ bin Trial Time_…⁴ Time_…⁵ Time_…⁶ Spike…⁷
## <dbl> <dbl> <dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 128 0.0442 5.66 45 1 blank 99.3 99.3 99.3 0
## 2 128 0.0442 5.66 45 2 blank 99.4 99.4 99.4 0
## 3 128 0.0442 5.66 45 3 blank 99.4 99.4 99.4 0
## 4 128 0.0442 5.66 45 4 blank 99.4 99.4 99.4 111.
## 5 128 0.0442 5.66 45 5 blank 99.4 99.4 99.4 0
## 6 128 0.0442 5.66 45 6 blank 99.4 99.4 99.4 0
## 7 128 0.0442 5.66 45 7 blank 99.4 99.4 99.4 0
## 8 128 0.0442 5.66 45 8 blank 99.4 99.4 99.4 0
## 9 128 0.0442 5.66 45 9 blank 99.4 99.4 99.4 0
## 10 128 0.0442 5.66 45 10 blank 99.4 99.4 99.4 0
## # … with 3,497 more rows, 6 more variables: Photod_mean <dbl>, Spikes_0 <dbl>,
## # Time_stand <dbl>, Blank_end <dbl>, Static_end <dbl>, Replicate <int>, and
## # abbreviated variable names ¹Spatial_Frequency, ²Temporal_Frequency,
## # ³Direction, ⁴Time_bin_mid, ⁵Time_bin_begin, ⁶Time_bin_end, ⁷Spike_rate
## To consolidate this, you can use bind_rows()
1]] %>% bind_rows all_replicate_data_reorganized[[
## # A tibble: 284,067 × 16
## Speed Spatial_F…¹ Tempo…² Direc…³ bin Trial Time_…⁴ Time_…⁵ Time_…⁶ Spike…⁷
## <dbl> <dbl> <dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1024 0.0156 16 0 1 blank 181. 181. 181. 0
## 2 1024 0.0156 16 0 2 blank 181. 181. 181. 0
## 3 1024 0.0156 16 0 3 blank 181. 181. 181. 0
## 4 1024 0.0156 16 0 4 blank 181. 181. 181. 0
## 5 1024 0.0156 16 0 5 blank 181. 181. 181. 0
## 6 1024 0.0156 16 0 6 blank 181. 181. 181. 0
## 7 1024 0.0156 16 0 7 blank 181. 181. 181. 0
## 8 1024 0.0156 16 0 8 blank 181. 181. 181. 222.
## 9 1024 0.0156 16 0 9 blank 181. 181. 181. 222.
## 10 1024 0.0156 16 0 10 blank 181. 181. 181. 111.
## # … with 284,057 more rows, 6 more variables: Photod_mean <dbl>,
## # Spikes_0 <dbl>, Time_stand <dbl>, Blank_end <dbl>, Static_end <dbl>,
## # Replicate <int>, and abbreviated variable names ¹Spatial_Frequency,
## # ²Temporal_Frequency, ³Direction, ⁴Time_bin_mid, ⁵Time_bin_begin,
## # ⁶Time_bin_end, ⁷Spike_rate
Bear in mind that I am specifically describing
all_replicate_data_reorganized[[1]]
, NOT all_replicate_data_reorganized
.
This is because all_replicate_data_reorganized
will itself be a list
that is
as long as the number of distinct data files you are feeding into all of the
above. Since there is only 1 example file, all_replicate_data_reorganized
is a
list
that is only 1 entry long at its top level, and within that list
is set
of 48 list
s (one per distinct stimulus), each of which contains a
stim-specific tibble
of data.
5.3 Data export
It is highly recommended that you export
all_replicate_data_reorganized
. I generally export
all_replicate_data_reorganized
as a separate csv
file for each of
the following conditions:
- Unbinned
- Bin size = 10 ms
- Bin size = 100 ms
This section will provide code to export each of the above, assuming
you used those bin sizes in the previous section. Please be sure to
change file naming conventions and other parameters in the event you
chose a different bin_size
in the previous section.
## Declare export destination
<- "./data/"
export_path ## The "condition" will be appended to the file name.
## Export each tibble within all_replicate_data_reorganized
for (i in 1:length(all_replicate_data_reorganized)) {
print(i)
<-
dat %>%
all_replicate_data_reorganized[[i]] bind_rows()
write_csv(
dat,file =
paste0(
export_path,names(all_replicate_data_reorganized)[i],
condition,".csv"
)
)rm(dat)
}
Re-run the above chunk of code per condition you seek to export. For me, this
process creates 3 files: 2023-02-16_001_unbinned.csv
,
2023-02-16_001_binsize10.csv
, and 2023-02-16_001_binsize100.csv
.
🐢